Posts Tagged ‘phantomjs’

Google Books Downloader

August 17th, 2012 No comments

A few months ago, I started a pet project to download free books available at Google Books.

My main goal when I started this project was to learn more about JavaScript headless browsers (PhantomJS in this case), which used CasperJS to drive it.

CasperJS was at version 0.6.6 back then and it had a nasty bug that prevented us from downloading binary files with the method, so my script initially saved a list of the pages in a wget friendly format.

Now that the bug is fixed, I updated the script to save the pages automatically.

I still haven’t implemented any command line switches, so the book to download is still hardcoded (spoil alert: the book is actually a 1964 issue of the Popular Science magazine, which has a very interesting article on automation :), and you can’t select between saving the wget friendly list or saving the pages automatically.

Feel free to fork the project, improve it and send me pull requests :)