33

For tedious reasons to do with Hpricot, I need to write a function that is passed a URL, and returns the whole contents of the page as a single string.

I'm close. I know I need to use OpenURI, and it should look something like this:

require 'open-uri'
open(url) {
  # do something mysterious here to get page_string
}
puts page_string

Can anyone suggest what I need to add?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
AP257
  • 89,519
  • 86
  • 202
  • 261

8 Answers8

65

You can do the same without OpenURI:

require 'net/http'
require 'uri'

def open(url)
  Net::HTTP.get(URI.parse(url))
end

page_content = open('http://www.google.com')
puts page_content

Or, more succinctly:

Net::HTTP.get(URI.parse('http://www.google.com'))
Jason Swett
  • 43,526
  • 67
  • 220
  • 351
Carlo Pecchia
  • 1,173
  • 1
  • 6
  • 8
  • 11
    What's the disadvantage of using open-uri? – Watusimoto Sep 20 '12 at 11:30
  • 5
    Yeah, it's super confusing that this more-complicated answer has way more upvotes than the other ones. I tried searching for a reason myself and found [this question/answer](http://stackoverflow.com/a/16764302/199712) that seems to recommend OpenURI over Net::HTTP in most cases, which just makes things more confusing. THANKS, OBAMA – Jason Swett Jul 29 '14 at 22:27
  • 7
    open-uri internally patches `Kernel.open`. Here is an [article](http://sakurity.com/blog/2015/02/28/openuri.html) talking about things one should be aware of when using open-uri. I have also come across method naming conflicts `open` when using it together with other libraries such as bunny gem (which also implements `open`) – EricC Jun 05 '15 at 03:28
  • 8
    complicated? This is super simple (you can also do it on one line `Net::HTTP.get(URI.parse('http://www.google.com'))`. And it doesn't do crazy things under the hood. – akostadinov Nov 19 '15 at 17:56
22

The open method passes an IO representation of the resource to your block when it yields. You can read from it using the IO#read method

open([mode [, perm]] [, options]) [{|io| ... }] 
open(path) { |io| data = io.read }
Jeriko
  • 6,547
  • 4
  • 28
  • 40
12
require 'open-uri'
open(url) do |f|
  page_string = f.read
end

See also the documentation of IO class

Teoulas
  • 2,943
  • 22
  • 27
5

I was also very confused what to use for better performance and speedy results. I ran a benchmark for both to make it more clear:

require 'benchmark'
require 'net/http'
require "uri"
require 'open-uri'

url = "http://www.google.com"
Benchmark.bm do |x|
  x.report("net-http:")   { content = Net::HTTP.get_response(URI.parse(url)).body if url }
  x.report("open-uri:")   { open(url){|f| content =  f.read } if url }
end

Its result is:

              user     system      total        real
net-http:  0.000000   0.000000   0.000000 (  0.097779)
open-uri:  0.030000   0.010000   0.040000 (  0.864526)

I'd like to say that it depends on what your requirement is and how you want to process.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Gagan Gami
  • 10,121
  • 1
  • 29
  • 55
  • I tried a similar benchmark, and have found both methods to be about equal in speed, although it's hard to tell because external factors affect download speed (eg. wifi, other apps) – sondra.kinsey Aug 31 '18 at 15:00
4

To make code a little clearer, the OpenURI open method will return the value returned by the block, so you can assign open's return value to your variable. For example:

xml_text = open(url) { |io| io.read }
ndnenkov
  • 35,425
  • 9
  • 72
  • 104
Keith Bennett
  • 4,722
  • 1
  • 25
  • 35
  • nice, here's a one liner to get amazon EC2 public IP ranges: `ruby -r json -ropen-uri -e 'JSON.parse(open("https://ip-ranges.amazonaws.com/ip-ranges.json") { |io| io.read })["prefixes"].each {|p| puts #{p["ip_prefix"] if p["service"]=="EC2"}; '` – akostadinov Feb 05 '15 at 14:51
  • fixed typo in the one-liner: `ruby -r json -r open-uri -e 'JSON.parse(open("https://ip-ranges.amazonaws.com/ip-ranges.json") { |io| io.read })["prefixes"].each {|p| puts p["ip_prefix"] if p["service"]=="EC2"}; '` – Magnus May 13 '15 at 14:15
3

Starting with Ruby 3.0, calling URI.open via Kernel#open has been removed, so instead call URI.open directly:

require 'open-uri'
page_string = URI.open(url, &:read)
fn control option
  • 1,745
  • 6
  • 18
1

Try the following instead:

require 'open-uri' 
content = URI(your_url).read
davidrac
  • 10,723
  • 3
  • 39
  • 71
gayavat
  • 18,910
  • 11
  • 45
  • 55
  • 2
    Not sure why this is currently downvoted – on Ruby 3.1 this is the shortest/subjectively nicest way of doing it. – Henrik N Sep 02 '22 at 07:36
-2

require 'open-uri'
open(url) {|f|  #url must specify the protocol
str = f.read()
}
bhups
  • 14,345
  • 8
  • 49
  • 57