2

when I use urllib2,and list the headers,I cannot see the 'Location' header.

In [19]:p = urllib2.urlopen('http://www.example.com')


In [21]: p.headers.items()
Out[21]: 
[('transfer-encoding', 'chunked'),
 ('vary', 'Accept-Encoding'),
 ('server', 'Apache/2.2.3 (CentOS)'),
 ('last-modified', 'Wed, 09 Feb 2011 17:13:15 GMT'),
 ('connection', 'close'),
 ('date', 'Fri, 25 May 2012 03:00:02 GMT'),
 ('content-type', 'text/html; charset=UTF-8')]

If I use telnet and GET

telnet www.example.com 80
Trying 192.0.43.10...
Connected to www.example.com.
Escape character is '^]'.
GET / HTTP/1.0  
Host:www.example.com

HTTP/1.0 302 Found
Location: http://www.iana.org/domains/example/
Server: BigIP
Connection: close
Content-Length: 0

So, using urllib2 , how do I get the value of 'Location' header?

damon
  • 8,127
  • 17
  • 69
  • 114

2 Answers2

6

This is because by default urllib2 follows location headers. So the final response will not have one. If you disable following redirects suddenly you can see the location headers of 301 and 302 pages. See: How do I prevent Python's urllib(2) from following a redirect

Borrowing from there:

class NoRedirection(urllib2.HTTPErrorProcessor):
  def http_response(self, request, response):
    return response
  https_response = http_response

opener = urllib2.build_opener(NoRedirection)
location = opener.open('http://www.example.com').info().getheader('Location')
Community
  • 1
  • 1
Joe
  • 368
  • 3
  • 13
3

Use the geturl method on the returned file-like object from urlopen:

>>> f = urllib2.urlopen('http://www.example.com')
>>> f.geturl()
'http://www.iana.org/domains/example/'
daedalus
  • 10,873
  • 5
  • 50
  • 71
  • thanks..but I was wondering why the file like object returned by `urlopen()` doesn't contain a `Location` header..It has other headers like `server` etc – damon May 25 '12 at 04:20
  • Yes, @damon, strange, come to think of it. I am not sure what was the philosophy behind this particular design decision. I have just always accepted that this was a convenient wrapper around that particular header and accepted it as-is. – daedalus May 25 '12 at 04:35
  • Didn't work for me, I just got the original URL back – AdjunctProfessorFalcon Jul 10 '15 at 04:27
  • Try the solution by @Joe below? Or ask a new question showing what you have tried? – daedalus Jul 10 '15 at 07:10