I’m looking for an application that will allow me to download the entire contents of a web site in one throw. In the old days, I would have used GoZilla for this purpose, but GoZilla seems to have fallen on hard times.
Ideally, I’d like to be able to download just the pages on a site that contain particular keywords, but the whole site would do.
Can anybody recommend an application that does this that (a) runs on Windows XP, (b) ain’t spyware, and (c) is free- or shareware?
Thanks much, folks!
Author: Jimmy Akin
Jimmy was born in Texas, grew up nominally Protestant, but at age 20 experienced a profound conversion to Christ. Planning on becoming a Protestant seminary professor, he started an intensive study of the Bible. But the more he immersed himself in Scripture the more he found to support the Catholic faith, and in 1992 he entered the Catholic Church. His conversion story, "A Triumph and a Tragedy," is published in Surprised by Truth. Besides being an author, Jimmy is the Senior Apologist at Catholic Answers, a contributing editor to Catholic Answers Magazine, and a weekly guest on "Catholic Answers Live."
View all posts by Jimmy Akin
Have you tried Internet Explorer’s offline favorites feature?
Looks to me like the Scrapbook extension for Firefox might be what you’re looking for:
https://addons.mozilla.org/firefox/427/
Secret project #4?
No, this has nothing to do with secret project #4. That’s something I am working on at the moment (with a bunch of other people), though.
Jimmy,
If you don’t mind using the command line, you could try Wget. The main page is here and the Windows port is here.
At THIS MOMENT!!!??? WOW!! 😉
Oops. Forgot a closing tag there. Anyway, both links still work (in Firefox at least), only the first spills over to the edge of the second.
Yes, I second the suggestion of Publius. Try wget. It’s a command-line utility, which means that it can also be easily run from within a shell script (or batch file on MS-Windows).
I’d look into Jplucker. I don’t know if it would exactly fit your needs.
IsiloX? Freeware I believe.
Jimmy,
Last summer, we had a job managing some campgrounds. We used the free software at ~
http://www.httrack.com/ and got to read an awful lot of This Rock magazine articles…
enjoy!
If you use Firefox, you can give the “Down Them All!” download manager extension a try and according to the website: “DownThemAll is absolutely freeware and open-source. No Adware, no Spyware.”
https://addons.mozilla.org/firefox/201/
Take a look at http://www.webstripper.net
WinHTTrack.
http://www.httrack.com/page/2/en/index.html
I use Firefox with the Spiderzilla extension based on WinHTTrack from http://www.httrack.com. WinHTTrack can be used without Firefox.
I don’t think DownThemAll will capture entire websites.
I have used WinHTTrack with good success to capture whole sights several links deep if you want, and all links are made reletive so they are browsable offline.
God bless, http://www.shiningpeak.com
I’ve used winhttrack too. Very good product.
I use webzip from Spidersoft. It works great and downloads everything exactly as it is on the server (which it sounds like you want). It has multiple options so you can set it to take what you want. The program is shareware with a free trial period (if you get your hands on an old version, which works just fine if you are gathering text sites not data base or php driven sites, the trial period doesn’t end)
http://www.spidersoft.com/
Jimmy, it sounds like you want a web mirroring program, not just a program that downloads chosen files or downloads all linked files in a particular page, some levels deep. A program that can do web mirroring synchronizes every file hosted in an online directory with a backup location on your computer.
So I don’t think you want Firefox’s extension, Download Them All, because all that does is download everything linked on a given page. Scrapbook is nice, but similarly, it needs you to start on a given page and then you tell it to get what is linked up to X levels deep, so it’s still possible to not get the entire site with Scrapbook, because it depends on HTML links to be aware of content and grab it. I seriously doubt Scrapbook would be able to get everything from a site like catholic.com even if it was set to go many levels deep.
A cursory search on Tucows gave me the shareware result “AJC Directory Synchronizer”. I’m sure there are many other site mirroring tools. I think you’ll want to use the terms “site” and “mirror” in your search criteria on download sites such as Tucows or Download.com, to get specifically what you’re looking for as opposed to the rest.
Oh, just wanted to add: Besides “site” and “mirror”, try additionally the search term “synchronize”.
I also found this page for you which can give you an intro to mirroring: http://www.boutell.com/newfaq/creating/mirroring.html
They recommend wget. Here’s how it says to use wget to mirror a site:
where http://xxx.yyy of course would be replaced with, say, http://www.catholic.com
I’d create a new folder just for this first, to keep your downloads tidy, and then enter into the new folder from your command prompt, before issuing the wget command.
(If Catholic.com is the site you want to mirror, make sure to use the “www” in the URL to avoid downloading a bunch of forum posts from forums.catholic.com, assuming you don’t want to grab those.)
http://pagesucker.com/ is shareware that has a free demo. It is easy to run and will put the website into a file.
I’d second the recommendation for wget if you’re looking for something scriptable/repeatable. It has loads of parameters/switches for tweaking how it crawls through a website and what it does with the resulting files. I’ve only ever played with it in the Unix/Linux/OSX domain, but I’m sure there are Windows ports available.
I would like to find something that would accomplish the same job, but for Safari (on a Mac of course). Any ideas?
Free utility that I’ve used for years
http://www.webreaper.net/
Michael
See http://www.isilo.com. It does what you ask for and its output can be read on pc, pocketpc, palm, linux (with appropriate reader for each platform).
I think Adobe Acrobat Pro v7.0 can do what you’re talking about. I do something similar with Adobe Acrobat Standard 5.0, and I think 7.0 pro has this feature and then some.
Of course, I’m pretty sure it will download them as PDFs, and that would present some difficulty if you wanted to convert them over to another format, I think. Also, Adobe software is pricey.
So, FWIW,