2

I can open a webpage such as nike's page with Python 2.7's urllib2 library on my ubuntu desktop. But, when I move that code to a google compute engine server (with the same O.S.), it starts returning a HTTP Error 503: Service Unavailable.

What could be causing this error from one place and not another and, if possible, how would I go about making my machines behave consistently?

Rorschach
  • 3,684
  • 7
  • 33
  • 77

2 Answers2

2

That server returns urllib2.HTTPError: HTTP Error 403: Forbidden unless you pass an 'Accept' header. Using only the 'User-Agent' header failed when I tried. Here is the working code; I've commented out the unnecessary 'User-Agent' and 'Connection' headers, but left them for reference:

import urllib2
user_agent = {'User-Agent': 'Mozilla/5.0'}
req_headers = {
    # 'User-Agent': user_agent,
    # 'Connection': 'Keep-Alive',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
}
request = urllib2.Request('http://www.nike.com/us/en_us/c/men', headers=req_headers)
response = urllib2.urlopen(request)
data = response.read()
print data

Also see this other Stackoverflow answer, which I used as a reference for the 'Accept' string.

Community
  • 1
  • 1
Michelle Welcks
  • 3,513
  • 4
  • 21
  • 34
  • Thanks for the suggestion, but I am using the same header across both machines (with different results). I tried your Accept header as well, but it did not change the result. – Rorschach Nov 11 '15 at 20:41
  • 1
    @Rorschach Thanks for the update. If you've been hitting their site too much, your IP could have been blocked. (Seen that more than once.) Can you bring up another server with a different IP and try from the new instance to prove/disprove that possibility? – Michelle Welcks Nov 11 '15 at 21:42
1

HTTP Status 503 means, and I quote RFC 2612: "The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header."

So, it's not at all about where the request comes from: it's all about the server being temporarily overloaded, or, in maintenance. Check for a Retry-After header in the response and apply it; or, if missing, "retry later" more generically.

If persistent (it shouldn't be: 503 means the server is suffering a temporary condition), contact the web site system administrators and get an explanation of what's going on. To repeat, this is strictly about the web server you're contacting, and should be a temporary condition; not at all about your client.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395