31

My users submit urls (to mixes on mixcloud.com) and my app uses them to perform web requests.

A good url returns a 200 status code:

uri = URI.parse("http://www.mixcloud.com/ErolAlkan/hard-summer-mix/")
request = Net::HTTP.get_response(uri)(
#<Net::HTTPOK 200 OK readbody=true>

But if you forget the trailing slash then our otherwise good url returns a 301:

uri = "http://www.mixcloud.com/ErolAlkan/hard-summer-mix"
#<Net::HTTPMovedPermanently 301 MOVED PERMANENTLY readbody=true> 

The same thing happens with 404's:

# bad path returns a 404
"http://www.mixcloud.com/bad/path/" 
# bad path minus trailing slash returns a 301
"http://www.mixcloud.com/bad/path"
  1. How can I 'drill down' into the 301 to see if it takes us on to a valid resource or an error page?
  2. Is there a tool that provides a comprehensive overview of the rules that a particular domain might apply to their urls?
Douglas F Shearer
  • 25,952
  • 2
  • 48
  • 48
stephenmurdoch
  • 34,024
  • 29
  • 114
  • 189

5 Answers5

55

301 redirects are fairly common if you do not type the URL exactly as the web server expects it. They happen much more frequently than you'd think, you just don't normally ever notice them while browsing because the browser does all that automatically for you.

Two alternatives come to mind:

1: Use open-uri

open-uri handles redirects automatically. So all you'd need to do is:

require 'open-uri' 
...
response = open('http://xyz...').read

If you have trouble redirecting between HTTP and HTTPS, then have a look here for a solution:
Ruby open-uri redirect forbidden

2: Handle redirects with Net::HTTP

def get_response_with_redirect(uri)
   r = Net::HTTP.get_response(uri)
   if r.code == "301"
     r = Net::HTTP.get_response(URI.parse(r['location']))
   end
   r
end

If you want to be even smarter you could try to add or remove missing backslashes to the URL when you get a 404 response. You could do that by creating a method like get_response_smart which handles this URL fiddling in addition to the redirects.

Martin Dorey
  • 2,944
  • 2
  • 24
  • 16
Casper
  • 33,403
  • 4
  • 84
  • 79
  • Thanks, that explains everything perfectly. I will go with option 2. – stephenmurdoch Aug 26 '11 at 21:11
  • 1
    @stephen - Great :) If you want to learn more about HTTP codes you can look at the specs directly here: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html – Casper Aug 26 '11 at 21:31
  • 3
    Would work with multiple redirects if you change line 4 to: `r = get_response_with_redirect(URI.parse(r.header['location']))` – ReggieB Jul 04 '13 at 10:24
  • Thanks. You can also use the constant `Net::HTTPMovedPermanently`, example: `if r.is_a? Net::HTTPMovedPermanently`. – Ivan Black May 23 '14 at 10:56
  • 2
    I can't seem to get open-uri to follow redirects. Is this something that changed in a more recent ruby version? (running 2.2) – dmur Mar 02 '16 at 01:00
  • 1
    @dmur Are you redirecting between HTTP and HTTPS? If so, have a look here: http://stackoverflow.com/questions/27407938/ruby-open-uri-redirect-forbidden – Casper Mar 02 '16 at 05:53
  • `response = open('http://xyz...').read` would probably be what most people will need. – DannyB May 19 '16 at 14:44
8

I can't figure out how to comment on the accepted answer (this question might be closed), but I should note that r.header is now obsolete, so r.header['location'] should be replaced by r['location'] (per https://stackoverflow.com/a/6934503/1084675 )

Community
  • 1
  • 1
PhilGA
  • 303
  • 2
  • 8
4

rest-client follows the redirections for GET and HEAD requests without any additional configuration. It works very nice.

  • for result codes between 200 and 207, a RestClient::Response will be returned
  • for result codes 301, 302 or 307, the redirection will be followed if the request is a GET or a HEAD
  • for result code 303, the redirection will be followed and the request transformed into a GET

example of usage:

require 'rest-client'

RestClient.get 'http://example.com/resource'

The rest-client README also gives an example of following redirects with POST requests:

begin
  RestClient.post('http://example.com/redirect', 'body')
rescue RestClient::MovedPermanently,
       RestClient::Found,
       RestClient::TemporaryRedirect => err
  err.response.follow_redirection
end
max pleaner
  • 26,189
  • 9
  • 66
  • 118
NickGnd
  • 5,107
  • 1
  • 20
  • 26
3

Here is the code I came up with (derived from different examples) which will bail out if there are too many redirects (note that ensure_success is optional):

require "net/http"
require "uri"
class Net::HTTPResponse
  def ensure_success
    unless kind_of? Net::HTTPSuccess
      warn "Request failed with HTTP #{@code}"
      each_header do |h,v|
        warn "#{h} => #{v}"
      end
      abort
    end
  end
end
def do_request(uri_string)
  response = nil
  tries = 0
  loop do
    uri = URI.parse(uri_string)
    http = Net::HTTP.new(uri.host, uri.port)
    request = Net::HTTP::Get.new(uri.request_uri)
    response = http.request(request)
    uri_string = response['location'] if response['location']
    unless response.kind_of? Net::HTTPRedirection
      response.ensure_success
      break
    end
    if tries == 10
      puts "Timing out after 10 tries"
      break
    end
    tries += 1
  end
  response
end
Blaskovicz
  • 6,122
  • 7
  • 41
  • 50
1

Not sure if anyone is looking for this exact solution, but if you are trying to download an image http/https and store it to a variable

require 'open_uri_redirections'

require 'net/https'

web_contents  = open('file_url_goes_here', :ssl_verify_mode => OpenSSL::SSL::VERIFY_NONE, :allow_redirections => :all) {|f| f.read }
puts web_contents
chrisallick
  • 1,330
  • 17
  • 18