139

How do I download and save a binary file over HTTP using Ruby?

The URL is http://somedomain.net/flv/sample/sample.flv.

I am on the Windows platform and I would prefer not to run any external program.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Radek
  • 13,813
  • 52
  • 161
  • 255
  • My solution is strongly based on http://snippets.dzone.com/posts/show/2469 which appeared after I typed __ruby file download__ in FireFox address bar...so did You do any research on the internet before asking this question? – Dawid Feb 15 '10 at 01:17
  • @Dejw: I did research and found an answered question here. Basically with the same code you gave me. The `resp.body` part is confusing me I thought it would save only 'body' part of the response but I want to save whole/binary file. I also found that http://rio.rubyforge.org/ could be helpful. Moreover with my question nobody can say that such question was not answered yet :-) – Radek Feb 15 '10 at 01:23
  • 3
    The body part is exactly whole file. Response is created from headers (http) and body (the file), so when You saves the body You saved the file ;-) – Dawid Feb 15 '10 at 01:54
  • 1
    one more question... let's say the file is 100MB big and the download process get interrupted in the middle. Is there going to be anything saved? Can I do resume of the file? – Radek Feb 15 '10 at 02:28
  • Unfortunately not, because `http.get('...')` call sends a request and receives response (the whole file). To download a file in chunks and save it simultaneously see my edited answer below ;-) Resuming is not easy, maybe You count bytes You saved and then skip them when You redownload the file (`file.write(resp.body)` returns the number of bytes written). – Dawid Feb 15 '10 at 10:33
  • I asked also to make sure that I will have either 100% downloaded file or nothing. I am happy with that now :-) Thank you – Radek Feb 15 '10 at 10:57
  • See also: http://stackoverflow.com/a/2517286/165673 – Yarin Jan 11 '14 at 05:44
  • http://www.railshorde.com/blog/ruby-download-files-over-http – Animesh Jun 16 '15 at 19:17

9 Answers9

149

The simplest way is the platform-specific solution:

 #!/usr/bin/env ruby
`wget http://somedomain.net/flv/sample/sample.flv`

Probably you are searching for:

require 'net/http'
# Must be somedomain.net instead of somedomain.net/, otherwise, it will throw exception.
Net::HTTP.start("somedomain.net") do |http|
    resp = http.get("/flv/sample/sample.flv")
    open("sample.flv", "wb") do |file|
        file.write(resp.body)
    end
end
puts "Done."

Edit: Changed. Thank You.

Edit2: The solution which saves part of a file while downloading:

# instead of http.get
f = open('sample.flv')
begin
    http.request_get('/sample.flv') do |resp|
        resp.read_body do |segment|
            f.write(segment)
        end
    end
ensure
    f.close()
end
Jake Lin
  • 11,146
  • 6
  • 29
  • 40
Dawid
  • 4,042
  • 2
  • 27
  • 30
  • the first "simple" solution method won't work on windows machines – srcspider Jan 17 '13 at 15:52
  • 15
    Yes, I know. That is why I said that it is `a platform-specific solution`. – Dawid Jan 17 '13 at 21:28
  • 1
    More platform-specific solutions: GNU/Linux platforms provide `wget`. OS X provides `curl` (`curl http://oh.no/its/pbjellytime.flv --output secretlylove.flv`). Windows has a Powershell equivalent `(new-object System.Net.WebClient).DownloadFile('http://oh.no/its/pbjellytime.flv','C:\tmp\secretlylove.flv')`. Binaries for wget and curl exist for all operating system via download as well. I still highly recommend using the standard library unless your writing code solely for your own lovin'. – fny Jan 23 '13 at 12:51
  • PowerShell 3 has `Invoke-WebRequest`: `iwr $url -OutFile $path` – Simon Buchan Jul 05 '13 at 06:09
  • 1
    the begin ... ensure ... end is not necessary if the open block form is used. open 'sample.flv' do |f| .... f.write segment – lab419 Dec 17 '14 at 14:52
  • 1
    The non-text file arrives corrupted. – Paul Dec 22 '14 at 13:15
  • How can we make this multi threading if the filename is provided as method argument? Or is this solution multi threaded by default. Im using Rails 4 and PUMA server and in need to store mutiple files at the same time. – Rubytastic Apr 29 '15 at 09:09
  • 1
    I use chunked download using `Net::HTTP`. And I receive the part of the file but get response `Net::HTTPOK`. Is there any way to ensure we downloaded the file completely? – Nickolay Kondratenko Dec 04 '15 at 11:30
  • @NickolayKondratenko, this is in fact a good question. I wonder if there's any chance to get to the response headers to check size. – akostadinov Feb 26 '17 at 21:05
  • 1
    Rubocop marks just calling `open` as potentially unsafe. Better use `File.open` instead. See: https://rubocop.readthedocs.io/en/latest/cops_security/#securityopen – mvherweg Apr 26 '18 at 08:03
123

I know that this is an old question, but Google threw me here and I think I found a simpler answer.

In Railscasts #179, Ryan Bates used the Ruby standard class OpenURI to do much of what was asked like this:

(Warning: untested code. You might need to change/tweak it.)

require 'open-uri'

File.open("/my/local/path/sample.flv", "wb") do |saved_file|
  # the following "open" is provided by open-uri
  open("http://somedomain.net/flv/sample/sample.flv", "rb") do |read_file|
    saved_file.write(read_file.read)
  end
end
Alan W. Smith
  • 24,647
  • 4
  • 70
  • 96
kikito
  • 51,734
  • 32
  • 149
  • 189
  • 9
    `open("http://somedomain.net/flv/sample/sample.flv", 'rb')` will open the URL in binary mode. – zoli Sep 25 '12 at 19:21
  • 2
    anyone knows if open-uri is intelligent about filling the buffer as @Isa explained? – gdelfino Oct 26 '12 at 21:28
  • 1
    @gildefino You will get more answers if you open a new question for that. It is unlikely that many people will read this (and it is also the appropiate thing to do in Stack Overflow). – kikito Oct 26 '12 at 21:34
  • I daresay this is the cleaner, and thus better, solution. – J3RN Aug 13 '14 at 18:13
  • How can we make this multi threading if the filename is provided as method argument? Or is this solution multi threaded by default. Im using Rails 4 and PUMA server and in need to store mutiple files at the same time. – Rubytastic Apr 29 '15 at 09:09
  • 2
    Awesome. I had problems with `HTTP` => `HTTPS` redirection, and found out [how to solve it](http://stackoverflow.com/a/27411667/2752041) using [`open_uri_redirections` Gem](https://github.com/open-uri-redirections/open_uri_redirections) – mathielo May 29 '15 at 00:00
  • Worked perfectly for me. I used `:content_length_proc` and `:progress_proc` as well, though. (http://ruby-doc.org/stdlib-2.2.2/libdoc/open-uri/rdoc/OpenURI/OpenRead.html) – dimitarvp Jul 24 '15 at 18:05
  • 3
    FWIW some people think that open-uri is dangerous because it monkeypatches all code, including library code, that uses `open` with a new ability that the calling code might not anticipate. You shouldn't be trusting user input passed to `open` anyway, but you need to be doubly careful now. – method Aug 03 '16 at 14:22
46

Here is my Ruby http to file using open(name, *rest, &block).

require "open-uri"
require "fileutils"

def download(url, path)
  case io = open(url)
  when StringIO then File.open(path, 'w') { |f| f.write(io.read) }
  when Tempfile then io.close; FileUtils.mv(io.path, path)
  end
end

The main advantage here it is concise and simple, because open does much of the heavy lifting. And it does not read the whole response in memory.

The open method will stream responses > 1kb to a Tempfile. We can exploit this knowledge to implement this lean download to file method. See the OpenURI::Buffer implementation here.

Please be careful with user provided input! open(name, *rest, &block) is unsafe if name is coming from user input!

Use OpenURI::open_uri to avoid reading files from disk:

...
case io = OpenURI::open_uri(url)
...
Pavel Chuchuva
  • 22,633
  • 10
  • 99
  • 115
Overbryd
  • 4,612
  • 2
  • 33
  • 33
  • 4
    This should be the accepted answer as it's concise & simple & does not load the whole file in memory ~ + performance (guesstimate here). – Nikkolasg Sep 12 '16 at 13:56
  • I agree with Nikkolasg. I just tried to use it and it works very well. I modified it a bit though, for example, the local path will be deduced automatically from the URL given, so e. g. "path = nil" and then checking for nil; if it is nil, then I use File.basename() on the url to deduce the local path. – shevy Jul 03 '17 at 11:41
  • I wonder why it works correctly with `"w"`. Will it work on Windows or better put `"wb"` instead? – sekrett Dec 04 '17 at 10:59
  • 1
    This would be the best answer, but open-uri **DOES** load the whole file in memory https://stackoverflow.com/questions/17454956/how-to-get-http-headers-before-downloading-with-rubys-openuri – Simon Perepelitsa Aug 09 '18 at 19:30
  • 2
    @SimonPerepelitsa hehe. I revised it yet again, now providing a concise download-to-file method that **does not read the whole response** in memory. My previous answer would have been sufficient, because `open` actually does not read the response in memory, it reads it into a temporary file for any responses > 10240 bytes. So you were kind-a-right but not. The revised answer cleans up this misunderstanding and hopefully serves as a great example on the power of Ruby :) – Overbryd Aug 10 '18 at 17:18
  • 3
    If you get an `EACCES: permission denied` error when changing the filename with `mv` command its because you have to close the file first. Suggest changing that part to `Tempfile then io.close;` – David Douglas Oct 05 '18 at 09:50
  • Very useful! I was able to fetch a 10GB file and the ruby process only went from 89MB to 145MB in memory. It used to go over 13GB and crash/freeze. However, I had trouble with the smaller files. The contents in the smaller files were not correct until I added `io.read` to the StringIO case: `when StringIO then File.open(path, 'w') { |f| f.write(io.read) }`. Cheers. – Cruz Nunez Oct 30 '20 at 18:02
30

Example 3 in the Ruby's net/http documentation shows how to download a document over HTTP, and to output the file instead of just loading it into memory, substitute puts with a binary write to a file, e.g. as shown in Dejw's answer.

More complex cases are shown further down in the same document.

noraj
  • 3,964
  • 1
  • 30
  • 38
Arkku
  • 41,011
  • 10
  • 62
  • 84
  • +1 for pointing to existing documentation and further examples. – semperos Dec 29 '10 at 18:04
  • 1
    Here's the link specifically: http://ruby-doc.org/stdlib-2.1.4/libdoc/net/http/rdoc/Net/HTTP.html#class-Net::HTTP-label-Streaming+Response+Bodies – kgilpin Oct 29 '14 at 20:11
28

Following solutions will first read the whole content to memory before writing it to disc (for more i/o efficient solutions look at the other answers).

You can use open-uri, which is a one liner

require 'open-uri'
content = open('http://example.com').read

Or by using net/http

require 'net/http'
File.write("file_name", Net::HTTP.get(URI.parse("http://url.com")))
Felix
  • 4,510
  • 2
  • 31
  • 46
KrauseFx
  • 11,551
  • 7
  • 46
  • 53
  • 11
    This reads the whole file into memory before writing it to disk, so... that can be bad. – kgilpin Oct 29 '14 at 20:07
  • @kgilpin both solutions? – KrauseFx Oct 29 '14 at 20:34
  • That said, if you're OK with that, a shorter version (assuming url and filename are in variables `url` and `file`, respectively), using `open-uri` as in the first: `File.write(file, open(url).read)`... Dead simple, for the trivial download case. – lindes Oct 09 '15 at 13:20
18

Expanding on Dejw's answer (edit2):

File.open(filename,'w'){ |f|
  uri = URI.parse(url)
  Net::HTTP.start(uri.host,uri.port){ |http| 
    http.request_get(uri.path){ |res| 
      res.read_body{ |seg|
        f << seg
#hack -- adjust to suit:
        sleep 0.005 
      }
    }
  }
}

where filename and url are strings.

The sleep command is a hack that can dramatically reduce CPU usage when the network is the limiting factor. Net::HTTP doesn't wait for the buffer (16kB in v1.9.2) to fill before yielding, so the CPU busies itself moving small chunks around. Sleeping for a moment gives the buffer a chance to fill between writes, and CPU usage is comparable to a curl solution, 4-5x difference in my application. A more robust solution might examine progress of f.pos and adjust the timeout to target, say, 95% of the buffer size -- in fact that's how I got the 0.005 number in my example.

Sorry, but I don't know a more elegant way of having Ruby wait for the buffer to fill.

Edit:

This is a version that automatically adjusts itself to keep the buffer just at or below capacity. It's an inelegant solution, but it seems to be just as fast, and to use as little CPU time, as it's calling out to curl.

It works in three stages. A brief learning period with a deliberately long sleep time establishes the size of a full buffer. The drop period reduces the sleep time quickly with each iteration, by multiplying it by a larger factor, until it finds an under-filled buffer. Then, during the normal period, it adjusts up and down by a smaller factor.

My Ruby's a little rusty, so I'm sure this can be improved upon. First of all, there's no error handling. Also, maybe it could be separated into an object, away from the downloading itself, so that you'd just call autosleep.sleep(f.pos) in your loop? Even better, Net::HTTP could be changed to wait for a full buffer before yielding :-)

def http_to_file(filename,url,opt={})
  opt = {
    :init_pause => 0.1,    #start by waiting this long each time
                           # it's deliberately long so we can see 
                           # what a full buffer looks like
    :learn_period => 0.3,  #keep the initial pause for at least this many seconds
    :drop => 1.5,          #fast reducing factor to find roughly optimized pause time
    :adjust => 1.05        #during the normal period, adjust up or down by this factor
  }.merge(opt)
  pause = opt[:init_pause]
  learn = 1 + (opt[:learn_period]/pause).to_i
  drop_period = true
  delta = 0
  max_delta = 0
  last_pos = 0
  File.open(filename,'w'){ |f|
    uri = URI.parse(url)
    Net::HTTP.start(uri.host,uri.port){ |http|
      http.request_get(uri.path){ |res|
        res.read_body{ |seg|
          f << seg
          delta = f.pos - last_pos
          last_pos += delta
          if delta > max_delta then max_delta = delta end
          if learn <= 0 then
            learn -= 1
          elsif delta == max_delta then
            if drop_period then
              pause /= opt[:drop_factor]
            else
              pause /= opt[:adjust]
            end
          elsif delta < max_delta then
            drop_period = false
            pause *= opt[:adjust]
          end
          sleep(pause)
        }
      }
    }
  }
end
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Isa
  • 189
  • 1
  • 3
13

There are more api-friendly libraries than Net::HTTP, for example httparty:

require "httparty"
File.open("/tmp/my_file.flv", "wb") do |f| 
  f.write HTTParty.get("http://somedomain.net/flv/sample/sample.flv").parsed_response
end
fguillen
  • 36,125
  • 23
  • 149
  • 210
3

I had problems, if the file contained German Umlauts (ä,ö,ü). I could solve the problem by using:

ec = Encoding::Converter.new('iso-8859-1', 'utf-8')
...
f << ec.convert(seg)
...
Rolf
  • 39
  • 1
0

if you looking for a way how to download temporary file, do stuff and delete it try this gem https://github.com/equivalent/pull_tempfile

require 'pull_tempfile'

PullTempfile.transaction(url: 'https://mycompany.org/stupid-csv-report.csv', original_filename: 'dont-care.csv') do |tmp_file|
  CSV.foreach(tmp_file.path) do |row|
    # ....
  end
end
equivalent8
  • 13,754
  • 8
  • 81
  • 109