I'd like to develop a "page downloader" in ruby - something that, given a url, will download the html, the associated css, imagefiles and javascripts, and then change the html to reference the local copies instead of remote ones. Much like some browsers do with the "save as complete page" option.
I was thinking about using Nokogiri to do the initial parsing of the page. But I'm not sure it's the best tool for the job:
- Can it get a list of external dependencies (stylesheets, images, and javascripts). I don't care about javascript-generated dependencies.
- Does it parse CSS? I might want to download images or @imported css files, too.
Is there a gem that already does what I want?