36

I have this simple html parser(for learning purposes) that I have been working on.:

require 'open-uri'
puts "Enter URL to parse HTML: "
url = gets.chomp
puts "Enter tag to parse from: "
tag = gets.chomp
response = open(url).read
title1 = response.index(tag)
title2 = response.index(tag.insert(1,'/')) -1
result = response[(title1 + tag.length - 1)..title2]
print result 

and when I input http://twitter.com, I get this error message:

ERROR: `open_loop': redirection forbidden: http://twitter.com -> https://twitter.com/ (RuntimeError)
from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/open-uri.rb:149:in `open_uri'
from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/open-uri.rb:704:in `open'
from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/open-uri.rb:34:in `open'
from /home/ubuntu/workspace/htmlparse.rb:6:in `<main>' 

Any advise or help? Im new to Ruby and I am aware of other html parsing modules, but Im doing this to learn Ruby basics. Thanks.

Martin Tournoij
  • 26,737
  • 24
  • 105
  • 146
Vikaton
  • 2,227
  • 3
  • 14
  • 23
  • I believe that's happening because twitter uses `https`. FWIW - you may want to hit a site like `http://www.example.org` instead of twitter if you're just looking to learn and poke around. – orde Dec 10 '14 at 18:26
  • 1
    I know `http://` website work, but I thought open-uri automatically redirects to https, `https://twitter.com` works, but http doesnt, are there any solutions? – Vikaton Dec 10 '14 at 18:37
  • My advice: download that file to some server you control (github, bintray) so it can avoid redirects (sorry had to pipe in) – rogerdpack Apr 28 '18 at 06:57

4 Answers4

33

Have a look at the open_uri_redirections gem.

It patches Ruby's OpenURI to allow redirections from HTTP to HTTPS or the other way around.

fivedigit
  • 18,464
  • 6
  • 54
  • 58
21

You can also catch the exception and then try it again with 'https' url.

url = "http://classic.ona.io/api/v1/files/3538545?filename=gringgo/attachments/1485229166168.jpg"

uri = URI.parse(url)
tries = 3

begin
  uri.open(redirect: false)
rescue OpenURI::HTTPRedirect => redirect
  uri = redirect.uri # assigned from the "Location" response header
  retry if (tries -= 1) > 0
  raise
end

Source: https://twin.github.io/improving-open-uri/

akbarbin
  • 4,985
  • 1
  • 28
  • 31
kayn
  • 673
  • 5
  • 14
8

Ruby 2.4 fixed upgrade redirects (from http -> https) in open-uri, so now:

RUBY_VERSION
=> "2.4.2"

require 'open-uri'
=> true

open('http://twitter.com')
=> #<Tempfile:/tmp/open-uri20170926-24254-1kflwxq>

Source: http://blog.bigbinary.com/2017/03/02/open-uri-in-ruby-2-4-allows-http-to-https-redirection.html

MatzFan
  • 877
  • 8
  • 17
0

Just override in your source file the method redirectable? from open-uri that checks if redirect is allowed or not and return always true to allow redirect for all cases.

require 'open-uri'

def OpenURI.redirectable?(uri1, uri2)
  return true
end

url = "http://someurl.jpg"

URI.open(url)
Waitsnake
  • 1
  • 1