Ack!!!
Y’all are too kind!
I didn’t expect the kind of response I got down yonder (and by e-mail) in regard to my request for help stripping some material out of papal encyclical files.
It seems that it might be the best thing if I was more explicit about what I’m looking to have done, as it might help clarify matters.
First, though, thanks to all who wrote and offered help! Sorry I wasn’t more explicit the first time.
Here’s the basics:
So far as I can see, there are three things that need to be stripped out:
1. Footnotes. In most of the encyclicals these are numeric strings set off by being a superscript with a hyperlink, though in one encyclical at least they are superscripted only. Searching and deleting any numeric string that is superscripted, and any hyperlink should get rid of them.
2. Parenthetical remarks containing Scripture citations. These are more varied. Some contain only one Scripture citation. Some contain more than one. Some are preceded by "cf." just inside the opening parentheses. It may be the case that *all* parenthetical remarks fall into this class, though I haven’t verified that (and may not be able to), but it seems to me that deleting anything in parentheses *if* if contains a colon (used in verse divisions) would likely do the trick.
3. I’m also thinking about whether the paragraph numbers should be deleted for the project I have in mind. If so, this would involve deleting any numeric string at the beginning of a paragraph that is followed by a period (e.g., "22.").
I have all fourteen encyclicals in a zip file that I can send anyone willing to take a crack at it. The zip is about a meg in size, but for a sample file, HERE’S ONE.
I’m very thankful for the many offers that have come in regarding this, and want to thank everone who’s commented or e-mailed offers!
I want to make sure that folks don’t do redundant work, so maybe folks could use the combox if they think they’d be up for this (assuming no more clarification is needed), and I’ll use the combox to take someone up on the offer so that there’s no confusion.
Again, thanks to all and much obliged!