0

I am having some issues with Ruby's OpenURI follow redirect functionality.

When going to a URL that contains %20 in it, and that redirects with a 30x, Ruby's OpenURI fails.

  • The exact same URL, with a + instead of %20 works.
  • Both the %20 and + versions work properly with curl -L (follow).

Code

require 'open-uri'

base = "http://software-engineering-handbook.com/Handbook"

puts "===> PASS: URI Open +"
result = open "#{base}/Video+Series"
p result.status

puts "===> PASS: Curl +"
puts `curl -LIsS "#{base}/Video+Series" | grep HTTP`

puts "===> PASS: Curl %20"
puts `curl -LIsS "#{base}/Video%20Series" | grep HTTP`

puts "===> FAIL: URI Open %20"
begin
  result = open "#{base}/Video%20Series"
  p result.status
rescue => e
  puts "#{e.class} #{e.message}"
end

Output

===> PASS: URI Open +
["200", "OK"]
===> PASS: Curl +
HTTP/1.1 200 OK
===> PASS: Curl %20
HTTP/1.1 303 See Other
HTTP/1.1 200 OK
===> FAIL: URI Open %20
OpenURI::HTTPError 302 Found (Invalid Location URI)

I am not sure what is going on here. Tried HTTParty (although I know it is just a wrapper), hoping to see a different behavior, but it also fails.

DannyB
  • 12,810
  • 5
  • 55
  • 65
  • Not at all. The mentioned question is not related, besides the fact that it mentions spaces in URL encoding. – DannyB Dec 28 '19 at 21:32

1 Answers1

2

The server is responding with an redirect to an invalid URI. curl is being lax about it, but Ruby is being strict.

If we print out the e.cause we get more information.

#<URI::InvalidURIError: bad URI(is not URI?): "http://software-engineering-handbook.com/Handbook/Video Series/">

And also by looking at the headers from curl -I 'http://software-engineering-handbook.com/Handbook/Video%20Series'...

HTTP/1.1 303 See Other
Server: Cowboy
Date: Sat, 28 Dec 2019 21:41:28 GMT
Connection: keep-alive
Content-Type: text/html;charset=utf-8
Location: http://software-engineering-handbook.com/Handbook/Video Series/

And, indeed, the server is returning an invalid URI. Spaces are not allowed in a URI path. Ruby's URI class will not parse it.

> URI("http://software-engineering-handbook.com/Handbook/Video Series/")
URI::InvalidURIError: bad URI(is not URI?): "http://software-engineering-handbook.com/Handbook/Video Series/"
from /Users/schwern/.rvm/rubies/ruby-2.6.5/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in `split'
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • Thanks. This is the only HTTP client I found that does not understand this. Even Ruby's HTTP gem follows it properly. I will probably switch to it instead. – DannyB Dec 28 '19 at 22:03
  • Don't use the HTTP library unless you're designing a new client from scratch. Instead use one of the many HTTP clients that already exist. Writing a client that covers all the bases wastes your time when other excellent ones already exist. – the Tin Man Dec 29 '19 at 05:20