10

I am using rest-client to download large page (around 1.5 GB in size). Retrieved value is stored in memory than saved into a file. As result my program crashes with failed to allocate memory (NoMemoryError).

But it is not necessary to keep this data in memory, it may be even saved directly to disk.

I found "You can: (...) manually handle the response (e.g. to operate on it as a stream rather than reading it all into memory) See RestClient::Request's documentation for more information." on https://github.com/rest-client/rest-client Unfortunately after reading http://www.rubydoc.info/gems/rest-client/1.7.3/RestClient/Request I have no idea how it may be accomplished.

I am also aware that I may use other library (Using WWW:Mechanize to download a file to disk without loading it all in memory first) but my program is already using rest-client.

Simplified code:

data = RestClient::Request.execute(:method => :get, :url => url, :timeout => 3600)
file = File.new(filename, 'w')
file.write data
file.close

Code - https://github.com/mkoniecz/CartoCSSHelper/blob/395deab626209bcdafd675c2d8e08d0e3dd0c7f9/downloader.rb#L126

Community
  • 1
  • 1
reducing activity
  • 1,985
  • 2
  • 36
  • 64
  • http://www.rubydoc.info/gems/rest-client/1.7.3/RestClient/Request#fetch_body-instance_method read sources of this method and implement smth like this but to store data in file directly. – iced Mar 12 '15 at 12:57
  • You may use the builtin library OpenURI: `require 'open-uri'; File.open(path, 'w') { |f| IO.copy_stream(open(url), f) }`. If the file is large `open` will automatically write it to a Tempfile and return. If the file is small enough it will write it in a `StringIO`. Either way you will have an io object that you can block-copy to desired location. – Morozov Dec 09 '15 at 13:52
  • Did you solve this with RestClient? I have a similar problem but can't use open-uri as is doesn't support POST requests. – Raoot Apr 13 '16 at 05:17

2 Answers2

13

Another way is to use raw_response. This saves directly to a file, usually in /tmp and handles redirects without a problem. See Streaming Responses. Here's their example:

>> raw = RestClient::Request.execute(
           method: :get,
           url: 'http://releases.ubuntu.com/16.04.2/ubuntu-16.04.2-desktop-amd64.iso',
           raw_response: true)
=> <RestClient::RawResponse @code=200, @file=#<Tempfile:/tmp/rest-client.20170522-5346-1pptjm1>, @request=<RestClient::Request @method="get", @url="http://releases.ubuntu.com/16.04.2/ubuntu-16.04.2-desktop-amd64.iso">>
>> raw.file.size
=> 1554186240
>> raw.file.path
=> "/tmp/rest-client.20170522-5346-1pptjm1"
Carlos Fonseca
  • 7,881
  • 1
  • 17
  • 13
2

My original answer promoted passing a block to RestClient::Request#execute but this only passed data to the block once full response is read. Thus rendering the exercise worthless. This is how to properly do it:

File.open('/tmp/foo.iso', 'w') {|f|
    block = proc { |response|
      response.read_body do |chunk|
        puts "Working on response" 
        f.write chunk
      end
    }
    RestClient::Request.new(method: :get, url: 'http://mirror.pnl.gov/releases/xenial/ubuntu-16.04-server-amd64.iso', block_response: block).execute
}

It is from the related rest-client project issue.

Note: redirection does not work in this mode as well you lose HTTP exit status, cookies, headers, etc. Hope this is gonna be fixed some day.

akostadinov
  • 17,364
  • 6
  • 77
  • 85