I'm writing a Rails app having a simple web crawler that finds all links within a domain, stopping whenever it finds a link that leads outside of the domain. As normal for Rails developers, I've developed and tested the code mostly on my local machine, then just deployed to a staging server to try things out in real life.
When the crawler checks out a URL that redirects to another domain, on my local machine the #open
method returns an empty Tempfile object representing the redirection. It doesn't follow the redirect, it just indicates that one happened. I use this information to decide what message to feed back to the user.
However, on the server this same #open
method generates a RuntimeError
. I'm running the exact same Ruby (2.0.0 p576) and Rails (4.0.3) versions in both environments. I assumed that a given piece of Ruby code, for the same version of Ruby + Rails and the same Rails environment, would have the exact same behavior. It's pretty disconcerting to find that the same code and apparently same environment can have such different results.
Any idea why this same code acts differently on different machines? What files or settings should I look at, or what commands should I run, to try to identify where this different behavior is coming from? I have isolated the problem to the following exemplar.
Thanks in advance!
In the development environment:
Loading production environment (Rails 4.0.3)
2.0.0-p576 :001 > require 'uri'
=> false
2.0.0-p576 :003 > open 'http://www.ruby-doc.org/' # loads fine
=> #<Tempfile:/var/folders/hz/czmbmhds46s37t_pz8j198g40000gn/T/open-uri20141029-42188-51i9ls>
2.0.0-p576 :002 > open 'http://ndic.com' # loads fine
=> #<Tempfile:/var/folders/hz/czmbmhds46s37t_pz8j198g40000gn/T/open-uri20141029-42188-12kbadl>
In the production environment:
Loading production environment (Rails 4.0.3)
2.0.0-p576 :001 > require 'uri'
=> false
2.0.0-p576 :004 > open 'http://www.ruby-doc.org/' # loads fine
=> #<Tempfile:/tmp/open-uri20141029-11034-1sq9rtm>
2.0.0-p576 :002 > open 'http://ndic.com' # error!?
RuntimeError: redirection forbidden: http://ndic.com -> https://ndic.com/
from /usr/local/rvm/rubies/ruby-2.0.0-p576/lib/ruby/2.0.0/open-uri.rb:223:in `open_loop'
from /usr/local/rvm/rubies/ruby-2.0.0-p576/lib/ruby/2.0.0/open-uri.rb:149:in `open_uri'
from /usr/local/rvm/rubies/ruby-2.0.0-p576/lib/ruby/2.0.0/open-uri.rb:689:in `open'
from /usr/local/rvm/rubies/ruby-2.0.0-p576/lib/ruby/2.0.0/open-uri.rb:34:in `open'
from (irb):2
from /usr/local/rvm/gems/ruby-2.0.0-p576/gems/railties-4.0.3/lib/rails/commands/console.rb:90:in `start'
from /usr/local/rvm/gems/ruby-2.0.0-p576/gems/railties-4.0.3/lib/rails/commands/console.rb:9:in `start'
from /usr/local/rvm/gems/ruby-2.0.0-p576/gems/railties-4.0.3/lib/rails/commands.rb:62:in `<top (required)>'
from bin/rails:4:in `require'
from bin/rails:4:in `<main>'
EDIT:
One commenter asked if the problem could be that the second environment (latest CentOS) lacks the packages to make HTTPS requests. My understanding of the OpenURI library is that this shouldn't matter; if an http:// request will redirect to https://, the initial #open
call should just return an object explaining the redirect (analogous to an HTTP response). I've tried directly loading a HTTPS url like https://ndic.com, and in both cases this fails with an OpenSSL::SSL::SSLError
error. So I'm still stuck on the question of why the http:// (redirectable) request gets an error only in one environment.