35

In Python, I'm using urllib2 to open a url. This url redirects to another url, which redirects to yet another url.

I wish to print out the url after each redirect.

For example

-> = redirects to

A -> B -> C -> D

I want to print the URL of B, C and D (A is already known because it's the start URL).

Matthew H
  • 5,831
  • 8
  • 47
  • 82
  • 2
    why not use the requests module, `actualURL = requests.head(passedurl, timeout=100.0 , headers={'Accept-Encoding': 'identity'}).headers.get('location', passedurl)` ? – Ciasto piekarz Jul 26 '14 at 17:47

3 Answers3

48

You can easily get D by just asking for the current URL.

req = urllib2.Request(starturl, datagen, headers)
res = urllib2.urlopen(req)
finalurl = res.geturl()

To deal with the intermediate redirects you'll probably need to build your own opener, using HTTPRedirectHandler that records the redirects.

chmullig
  • 13,006
  • 5
  • 35
  • 52
10

Probably the best way is to subclass urllib2.HTTPRedirectHandler. Dive Into Python's chapter on redirects may be helpful.

Akshit Khurana
  • 674
  • 6
  • 14
Wooble
  • 87,717
  • 12
  • 108
  • 131
4

For Python 3, the solution with urllib is much simpler:

import urllib


def resolve(url):
    return urllib.request.urlopen(url).geturl()
jadelord
  • 1,511
  • 14
  • 19
  • 2
    This is the answer I was looking for! However, is this really a full solution? The OP was looking for intermediate redirect URLs `B` and `C` as well, not just the final destination `D`. – Micah Lindstrom Apr 10 '20 at 07:50