I am new to python and am trying to learn some new modules. Fortunately or unfortunately, I picked up the urllib2 module and started using it with one URL that's causing me problems.
To begin with, I created the Request object and then called Read() on the response object. It was failing. Turns out its getting redirected but the error code is still 200. Not sure what's going on. Here is the code --
def get_url_data(url):
print "Getting URL " + url
user_agent = "Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"
headers = { 'User-Agent' : user_agent }
request = urllib2.Request(url, str(headers) )
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
print response.geturl()
print response.info()
print response.getcode()
return False;
else:
print response
print response.info()
print response.getcode()
print response.geturl()
return response
I am calling the above function with http://www.chilis.com".
I was expecting to receive a 301, 302, or 303 but instead I see 200. Here is the response I see --
Getting URL http://www.chilis.com
<addinfourl at 4354349896 whose fp = <socket._fileobject object at 0x1037513d0>>
Cache-Control: private
Server: Microsoft-IIS/7.5
SPRequestGuid: 48bbff39-f8b1-46ee-a70c-bcad16725a4d
X-SharePointHealthScore: 0
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
MicrosoftSharePointTeamServices: 14.0.0.6120
X-MS-InvokeApp: 1; RequireReadOnly
Date: Wed, 13 Feb 2013 11:21:27 GMT
Connection: close
Content-Length: 0
Set-Cookie: BIGipServerpool_http_chilis.com=359791882.20480.0000; path=/
200
http://www.chilis.com/(X(1)S(q24tqizldxqlvy55rjk5va2j))/Pages/ChilisVariationRoot.aspx?AspxAutoDetectCookieSupport=1
Can someone explain what this URL is and how to handle this? I know I can use the "Handling Redirects" section from Diveintopython.net but there also with the code on that page I see the same response 200.
EDIT: Using the code from DiveintoPython, I see its a temporary redirection. What I don't understand is why the HTTP Errorcode from code is 200. Isn't that supposed to be the actual return code?
EDIT2: Now that I see it better, its not a weird redirection at all. I am editing the title.
EDIT3: If urllib2 follows the redirection automatically, I am not sure why the following code does not get the front page for chilis.com.
docObj = get_url_data(url)
doc = docObj.read()
soup = BeautifulSoup(doc, 'lxml')
print(soup.prettify())
If I use the URL that the browser eventually ends up getting redirected to it works (http://www.chilis.com/EN/Pages/home.aspx").