How do I modify my "download" function to work with 301/302 redirects?

Question

def download(source_url):
    try:
        socket.setdefaulttimeout(20)
        agents = ['Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1041.0 Safari/535.21','Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120403211507 Firefox/12.0']
        ree = urllib2.Request(source_url)
        ree.add_header('User-Agent',random.choice(agents))
        resp = urllib2.urlopen(ree)
        htmlSource = resp.read()
        return htmlSource
    except Exception, e:
        print e
        return ""

I wrote this download function. How do I make it work with 301/302?

Ex: http://tumblr.com/tagged/long-reads my function doesn't work with this url.

I can `urlopen` that URL just fine... – Fred Foo May 07 '12 at 12:06 — Fred Foo, May 07 '12 at 12:06
@larsmans did you try my function with it? – TIMEX May 07 '12 at 12:09 — TIMEX, May 07 '12 at 12:09
Yes. Works just fine (Python 2.6 on Linux). – Fred Foo May 07 '12 at 12:14 — Fred Foo, May 07 '12 at 12:14

score 1 · Answer 1 · edited May 23 '17 at 12:26

1

First, you have to get the HTTP response code, look at this.

If code is 30x, you have to get new url, look at this.

Then you can recursively call your function download() with new URL.

You should also add one parametr as redirection counter to avoid infinite looping.

edited May 23 '17 at 12:26

Community

1
1

answered May 07 '12 at 12:06

JerabekJakub

5,268
4
26
33

score 0 · Answer 2 · edited May 23 '17 at 10:08

0

If a redirect (301/2) code is returned, urllib2 should follow that redirect automatically.

Look at this related question. If it does not follow the redirect in your case this article examines in detail redirects handling.

edited May 23 '17 at 10:08

Community

1
1

answered May 07 '12 at 12:07

Joseph Victor Zammit

14,760
10
76
102

How do I modify my "download" function to work with 301/302 redirects?

2 Answers2