45

I've got a URL and I'm using HTTP GET to pass a query along to a page. What happens with the most recent flavor (in net/http) is that the script doesn't go beyond the 302 response. I've tried several different solutions; HTTPClient, net/http, Rest-Client, Patron...

I need a way to continue to the final page in order to validate an attribute tag on that pages html. The redirection is due to a mobile user agent hitting a page that redirects to a mobile view, hence the mobile user agent in the header. Here is my code as it is today:

require 'uri'
require 'net/http'

class Check_Get_Page

    def more_http
        url = URI.parse('my_url')
        req, data = Net::HTTP::Get.new(url.path, {
        'User-Agent' => 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_2 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5'
        })
        res = Net::HTTP.start(url.host, url.port) {|http|
        http.request(req)
            }
        cookie = res.response['set-cookie']
        puts 'Body = ' + res.body
        puts 'Message = ' + res.message
        puts 'Code = ' + res.code
        puts "Cookie \n" + cookie
    end

end

m = Check_Get_Page.new
m.more_http

Any suggestions would be greatly appreciated!

max
  • 96,212
  • 14
  • 104
  • 165
r3nrut
  • 1,045
  • 2
  • 11
  • 28
  • I used [final_redirect_url](https://rubygems.org/gems/final_redirect_url) gem to get the final url after multiple redirections. – Indyarocks May 04 '17 at 20:24

6 Answers6

76

To follow redirects, you can do something like this (taken from ruby-doc)

Following Redirection

require 'net/http'
require 'uri'

def fetch(uri_str, limit = 10)
  # You should choose better exception.
  raise ArgumentError, 'HTTP redirect too deep' if limit == 0

  url = URI.parse(uri_str)
  req = Net::HTTP::Get.new(url.path, { 'User-Agent' => 'Mozilla/5.0 (etc...)' })
  response = Net::HTTP.start(url.host, url.port, use_ssl: true) { |http| http.request(req) }
  case response
  when Net::HTTPSuccess     then response
  when Net::HTTPRedirection then fetch(response['location'], limit - 1)
  else
    response.error!
  end
end

print fetch('http://www.ruby-lang.org/')
duykhoa
  • 2,227
  • 1
  • 25
  • 43
emboss
  • 38,880
  • 7
  • 101
  • 108
  • Any clue as to how to add a user-agent to the header? response = Net::HTTP.get_response(URI.parse(uri_str.encode),{'User-Agent' => ua}) I tried that and it doesn't seem to work. I get the following error: c:/Ruby191/lib/ruby/1.9.1/net/http.rb:581:in `initialize': can't convert URI::HTTP into String (TypeError) – r3nrut Aug 04 '11 at 17:12
  • 3
    This does NOT work for a link that is redirected to itself but adding a back-slash, for example, `fetch('http://epn.dk/okonomi2/dk/ECE5373277/chefoekonom-corydon-skyder-langt-over-mal')`, the first iteration, it generates `#`, then exception... – Peter Lee Apr 26 '13 at 08:34
  • 6
    This does not work when the `response['Location']` is a relative path, e.g.: '/inbox'. In such a case, the original uri's path needs to be set, e.g.: `url.path = response['Location']`. – Matt Huggins Jul 19 '13 at 16:53
  • 1
    where you define ua variable? – ecleel Jan 01 '15 at 14:47
  • @MattHuggins According to the [HTTP spec](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html), the location header should always be an absolute URI, never a relative path. Where are you seeing relative paths? – David Moles May 27 '15 at 22:58
  • @DavidMoles I don't know, that was 2 years ago. But it happened! – Matt Huggins May 28 '15 at 03:51
  • 3
    @DavidMoles -- For instance, `http://www.puzzledragonx.com/en/monster.asp?n=9999` -- curl shows 302 redirect with `Location: /` header, and the above code pattern chokes without @MattHuggins advice. Or rather, with slight tweak -- craft new `new_uri = URI.parse(response['Location'])` then `if new_uri.relative?` set `new_uri.scheme = uri.scheme' and 'new_uri.host = uri.host` -- otherwise if you try to update the original path, then any query or fragment section will still remain from the original uri. – DreadPirateShawn Sep 28 '15 at 07:35
  • http://docs.ruby-lang.org/en/2.0.0/Net/HTTP.html#class-Net::HTTP-label-Following+Redirection – Leonardo Apr 01 '16 at 16:02
  • @DreadPirateShawn, I believe port also need to be updated from old uri. – Konstantin Jan 29 '18 at 10:44
  • 1
    @MattHuggins, @DreadPirateShawn: Rather than copying specific URI attributes, Use `URI.join(old_uri, new_location)`. That will keep any attributes unspecified in `new_location` from the old URI, but use the new scheme or hostname if they are provided. – sondra.kinsey Sep 15 '18 at 21:08
10

Given a URL that redirects

url = 'http://httpbin.org/redirect-to?url=http%3A%2F%2Fhttpbin.org%2Fredirect-to%3Furl%3Dhttp%3A%2F%2Fexample.org'

A. Net::HTTP

begin
  response = Net::HTTP.get_response(URI.parse(url))
  url = response['location']
end while response.is_a?(Net::HTTPRedirection)

Make sure that you handle the case when there are too many redirects.

B. OpenURI

open(url).read

OpenURI::OpenRead#open follows redirects by default, but it doesn't limit the number of redirects.

Panic
  • 2,229
  • 23
  • 25
6

I wrote another class for this based on examples given here, thank you very much everybody. I added cookies, parameters and exceptions and finally got what I need: https://gist.github.com/sekrett/7dd4177d6c87cf8265cd

require 'uri'
require 'net/http'
require 'openssl'

class UrlResolver
  def self.resolve(uri_str, agent = 'curl/7.43.0', max_attempts = 10, timeout = 10)
    attempts = 0
    cookie = nil

    until attempts >= max_attempts
      attempts += 1

      url = URI.parse(uri_str)
      http = Net::HTTP.new(url.host, url.port)
      http.open_timeout = timeout
      http.read_timeout = timeout
      path = url.path
      path = '/' if path == ''
      path += '?' + url.query unless url.query.nil?

      params = { 'User-Agent' => agent, 'Accept' => '*/*' }
      params['Cookie'] = cookie unless cookie.nil?
      request = Net::HTTP::Get.new(path, params)

      if url.instance_of?(URI::HTTPS)
        http.use_ssl = true
        http.verify_mode = OpenSSL::SSL::VERIFY_NONE
      end
      response = http.request(request)

      case response
        when Net::HTTPSuccess then
          break
        when Net::HTTPRedirection then
          location = response['Location']
          cookie = response['Set-Cookie']
          new_uri = URI.parse(location)
          uri_str = if new_uri.relative?
                      url + location
                    else
                      new_uri.to_s
                    end
        else
          raise 'Unexpected response: ' + response.inspect
      end

    end
    raise 'Too many http redirects' if attempts == max_attempts

    uri_str
    # response.body
  end
end

puts UrlResolver.resolve('http://www.ruby-lang.org')
sekrett
  • 1,205
  • 1
  • 15
  • 17
  • Thanks for this code snippet! I think you may want to close the http connections (`finish`) though so they don't leak. Much appreciated! – gmcnaughton May 20 '16 at 15:35
  • Definitely the best solution for me thus far. I could easily work with the page with `html_to_parse = Nokogiri::HTML(UrlResolver.resolve('http://www.ruby-lang.org'))` afterwards. Thanks. – DemitryT Jul 15 '16 at 11:08
  • I am not sure 100%, but in Ruby I think every object get destroyed automatically when get out of scope of def function. – sekrett Aug 08 '16 at 10:08
  • You can also use `url.request_uri` instead of manually constructing `path`, it includes the query params. – gmcnaughton Sep 27 '16 at 16:04
  • @gmcnaughton, nice. Can you send me a pull request on Github? – sekrett Sep 28 '16 at 14:47
3

The reference that worked for me is here: http://shadow-file.blogspot.co.uk/2009/03/handling-http-redirection-in-ruby.html

Compared to most examples (including the accepted answer here), it's more robust as it handles URLs which are just a domain (http://example.com - needs to add a /), handles SSL specifically, and also relative URLs.

Of course you would be better off using a library like RESTClient in most cases, but sometimes the low-level detail is necessary.

mahemoff
  • 44,526
  • 36
  • 160
  • 222
1

Maybe you can use curb-fu gem here https://github.com/gdi/curb-fu the only thing is some extra code to make it follow redirect. I've used the following before. Hope it helps.

require 'rubygems'
require 'curb-fu'

module CurbFu
  class Request
    module Base
      def new_meth(url_params, query_params = {})
        curb = old_meth url_params, query_params
        curb.follow_location = true
        curb
      end

      alias :old_meth :build
      alias :build :new_meth
    end
  end
end

#this should follow the redirect because we instruct
#Curb.follow_location = true
print CurbFu.get('http://<your path>/').body
Yesh
  • 23
  • 3
  • I've had complications in getting curb-fu to work on my Windows machine using Ruby 1.9.1p430... I can get it to work on my Mac but since this is something I have to run on a Windows server I need curb-fu to complete installation. Thanks for the suggestion. – r3nrut Aug 04 '11 at 14:48
0

If you do not need to care about the details at each redirection, you can use the library Mechanize

require 'mechanize'

agent = Mechanize.new
begin
    response = @agent.get(url)
rescue Mechanize::ResponseCodeError
    // response codes other than 200, 301, or 302
rescue Timeout::Error
rescue Mechanize::RedirectLimitReachedError
rescue StandardError
end

It will return the destination page. Or you can turn off redirection by this :

agent.redirect_ok = false

Or you can optionally change some settings at the request

agent.user_agent = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Mobile Safari/537.36"
quangkid
  • 1,287
  • 1
  • 12
  • 31