13

I want to get the content off this* page. Everything I've looked up gives the solution of parsing CSS elements; but, that page has none.

Here's the only code that I found that looked like it should work:

file = File.open('http://hiscore.runescape.com/index_lite.ws?player=zezima', "r")
contents = file.read
puts contents

Error:

tracker.rb:1:in 'initialize': Invalid argument - http://hiscore.runescape.com/index_lite.ws?player=zezima (Errno::EINVAL)
  from tracker.rb:1:in 'open'
  from tracker.rb:1

*http://hiscore.runescape.com/index_lite.ws?player=zezima

If you try to format this as a link in the post it doesn't recognize the underscore (_) in the URL for some reason.

Andrew
  • 12,172
  • 16
  • 46
  • 61

3 Answers3

42

You really want to use open() provided by the Kernel class which can read from URIs you just need to require the OpenURI library first:

require 'open-uri'

Used like so:

require 'open-uri'
file = open('http://hiscore.runescape.com/index_lite.ws?player=zezima')
contents = file.read
puts contents

This related SO thread covers the same question:

Open an IO stream from a local file or url

Community
  • 1
  • 1
Cody Caughlan
  • 32,456
  • 5
  • 63
  • 68
  • I see - didn't know that. Still, depending on what he is wanting to do with that content he might be better off with net/http. – halfdan Dec 06 '09 at 03:23
  • Oo, that's even better. Thanks. – Andrew Dec 06 '09 at 04:32
  • @halfdan - totally agree that net/http is better in general. I dont rely on this method for anything non-trivial / production. net/http has its shortcomings and I generally prefer the curl bindings (lib curb). This post has good info on http client performance - http://bit.ly/lvriR curb is great because you have much finer-grained control over the timeouts, which is super critical in high volume production usage. – Cody Caughlan Dec 06 '09 at 23:48
  • Do we need to use this syntax "source = open('http://www.google.com', &:read)" if we want the file closed? Someone elsewhere on SO said file.read alone won't close the file? Please weigh in on our question if you don't mind: http://stackoverflow.com/questions/21270239/ruby-what-is-the-difference-between-using-open-and-nethttp-module-to-fetch?noredirect=1#comment32065981_21270239. – Crashalot Jan 23 '14 at 10:37
  • You don't have to use that syntax but you can, will save the 2nd line of having to do the read. Its the same thing really, that 2nd argument is just passing a block to the open() call and the block executes after the open succeeds, thereby running the block (the `read`) and returning that result. 6 or 1/2 a dozen – Cody Caughlan Jan 24 '14 at 17:13
  • @CodyCaughlan And how would you go about updating asset paths in pulled html so that pulled html displays as it would if I navigate directly to that URL in browser? – saihgala Jun 01 '14 at 20:57
7

The appropriate way to fetch the content of a website is through the NET::HTTP module in Ruby:

require 'uri'
require 'net/http'
url = "http://hiscore.runescape.com/index_lite.ws?player=zezima"
r = Net::HTTP.get_response(URI.parse(url).host, URI.parse(url).path)

File.open() does not support URIs.

Best wishes,
Fabian

halfdan
  • 33,545
  • 8
  • 78
  • 87
7

Please use open-uri, its support both uri and local files

require 'open-uri'
contents  = open('http://www.google.com') {|f| f.read }
YOU
  • 120,166
  • 34
  • 186
  • 219