Using WWW:Mechanize to download a file to disk without loading it all in memory first

Question

I'm using Mechanize to facilitate the downloading of some files. At the moment my script uses the following line to actually download the files...

agent.get('http://example.com/foo').save_as 'a_file_name'

However this downloads the complete file into memory before dumping it to disk. How do you bypass this behavior, and simply download straight to disk? If I need to use something other than WWW:Mechanize then how would I go about using WWW:Mechanize's cookies with it?

Please note that the `Mechanize::File` class is not appropriate for large files. In those cases, one should use the `Mechanize::Download` class instead, as it downloads the content in small chunks to disk. Check [here](http://www.rubydoc.info/gems/mechanize/Mechanize/PluggableParser) and [here](http://www.rubydoc.info/gems/mechanize/Mechanize/Download) for more details. — nunop, Sep 06 '16 at 23:40

score 38 · Answer 1 · answered Feb 01 '12 at 23:48

38

What you really want is the Mechanize::Download

http://mechanize.rubyforge.org/Mechanize/Download.html

you can use this way:

require 'mechanize'

agent = Mechanize.new
agent.pluggable_parser.default = Mechanize::Download
agent.get('http://example.com/foo').save('a_file_name')

answered Feb 01 '12 at 23:48

Renato

505
4
9

I would add that I've just exactly used your solution except I had Mechanize:FileSaver instead of Mechanize:Download. And that hasn't worked => Files are save to disk but without any contents..0 kb. I've just replaced it with Download and the whole is perfect :) Thanks – Mik378 Aug 16 '12 at 01:24
3

Where does the file get saved? – carbonr Mar 27 '13 at 12:11
1

@carbonr with `agent.get(url).save(File.join(dir, filename))` the file will be saved into the dir you specify. – bfcoder Nov 24 '16 at 14:40
You could save file to ~/Downloads folder use this code `agent.get(download_url).save(File.join(Dir.home, 'Downloads', file_name))` – Honghao Z Jun 09 '17 at 01:30

score 2 · Answer 2 · edited Oct 09 '17 at 11:23

2

Have you looked at Mechanize::FileSaver? It looks like it can do what you require.

Here is an example that saves all the PDF files it encounters:

require 'rubygems'
require 'mechanize'

agent = Mechanize.new
agent.pluggable_parser.pdf = Mechanize::FileSaver
agent.get('http://example.com/foo.pdf')

edited Oct 09 '17 at 11:23

Nemo

2,441
2
29
63

answered Dec 06 '10 at 05:36

Gerhard

6,850
8
51
81

Using WWW:Mechanize to download a file to disk without loading it all in memory first

2 Answers2

Linked