Encyclical Bleg Redux

Ack!!!

Y’all are too kind!

I didn’t expect the kind of response I got down yonder (and by e-mail) in regard to my request for help stripping some material out of papal encyclical files.

It seems that it might be the best thing if I was more explicit about what I’m looking to have done, as it might help clarify matters.

First, though, thanks to all who wrote and offered help! Sorry I wasn’t more explicit the first time.

Here’s the basics:

So far as I can see, there are three things that need to be stripped out:

1. Footnotes. In most of the encyclicals these are numeric strings set off by being a superscript with a hyperlink, though in one encyclical at least they are superscripted only. Searching and deleting any numeric string that is superscripted, and any hyperlink should get rid of them.

2. Parenthetical remarks containing Scripture citations. These are more varied. Some contain only one Scripture citation. Some contain more than one. Some are preceded by "cf." just inside the opening parentheses. It may be the case that *all* parenthetical remarks fall into this class, though I haven’t verified that (and may not be able to), but it seems to me that deleting anything in parentheses *if* if contains a colon (used in verse divisions) would likely do the trick.

3. I’m also thinking about whether the paragraph numbers should be deleted for the project I have in mind. If so, this would involve deleting any numeric string at the beginning of a paragraph that is followed by a period (e.g., "22.").

I have all fourteen encyclicals in a zip file that I can send anyone willing to take a crack at it. The zip is about a meg in size, but for a sample file, HERE’S ONE.

I’m very thankful for the many offers that have come in regarding this, and want to thank everone who’s commented or e-mailed offers!

I want to make sure that folks don’t do redundant work, so maybe folks could use the combox if they think they’d be up for this (assuming no more clarification is needed), and I’ll use the combox to take someone up on the offer so that there’s no confusion.

Again, thanks to all and much obliged!

Author: Jimmy Akin

Jimmy was born in Texas, grew up nominally Protestant, but at age 20 experienced a profound conversion to Christ. Planning on becoming a Protestant seminary professor, he started an intensive study of the Bible. But the more he immersed himself in Scripture the more he found to support the Catholic faith, and in 1992 he entered the Catholic Church. His conversion story, "A Triumph and a Tragedy," is published in Surprised by Truth. Besides being an author, Jimmy is the Senior Apologist at Catholic Answers, a contributing editor to Catholic Answers Magazine, and a weekly guest on "Catholic Answers Live."

8 thoughts on “Encyclical Bleg Redux”

  1. This should be fairly simple to do in perl, Jimmy. Although, it might be tough to find a regular expression that will eliminate the scripture verse parenthetical expressions but leave others if they happen to contain a ‘:’… but that can be worked on. The regular expression could look not only for ‘:’ but also for book names: Genesis, Luke, etc… or their abbreviations.
    Anyhow, I’m willing to take a crack at it. Whenever anyone has a problem, out standard response at work is “I can do that in 3 lines of perl”. 😉

  2. Seems that you might be able to use TextPad or UltraEdit and three different regular expressions to strip each of these out. I believe both of the applications above will do multi-file search and replace using regex. I used to process hundreds of files in seconds this way.

  3. In the case of parenthetical markings, the first thing I would do is the test my scripture-reference finding code by having it print out every example of potential markings first.
    I’ll take a look at the one you posted, play with it over the weekend. Like I said in the post below, I’m supposed to be working on my perl skills for work, so I may even be able to count it as work hours 😉

  4. There are only six dozen (give or take) books in the Bible. If they’re all referenced the same way, you could test for that. Just in case there’s an innocent colon-containing parenthetical remark somewhere. 🙂

  5. Thanks for all the help, folks!
    Why don’t I e-mail Sean and see if he can take a crack at it?

  6. The document looks XHTML compliant. You might be able just to use existing XML editting software.

  7. Unfortunately, its not XHTML compliant. There are a lot of br tags with no closing tags and some p tags with no closing. But a relatively small perl script seems to have done the trick. For the computer geeks among you, I used HTML::Parser to do most of the dirty work.

  8. On a related geeky accessibility note, tonight I went on a quest to be able to “bookmark” the encyclicals. I can rarely read an entire one in just one sitting. I was thinking, “wouldn’t it be nice if the computer could keep track of where I was in this file?” Being a bit of an emacs text editor bigot, I found the following snippet that can be placed in the .emacs configuration file:
    (require ‘saveplace)
    (setq-default save-place t)

    From there if you skim the encyclicals and letters as text files (say RosariumVirginisMariae.txt), the editor always puts the cursor where it last was when you saved the file or exited the editor. I wish web browsers would/could do this.
    Being a kitchen sink text editor, emacs has several similar bookmarking abilities, but this seemed the most transparent. It just “automatically” picks up where you left off. I’m sure other editors have similar features. I just fall into the emacs side of the geeky, yet eternal Emacs vs. Vi text editor Holy Wars.
    Anyhow, now I have one less excuse to read JP2’s letters and encyclicals… 😉

Comments are closed.