14

Calling urrlib2.urlopen on a link to an article fetched from an RSS feed leads to the following error:

urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error tha t would lead to an infinite loop. The last 30x error message was: Moved Permanently

According to the documentation, urllib2 supports redirects.

On Java the problem was solved by just calling

HttpURLConnection.setFollowRedirects(true);

How can I solve it with Python?

UPDATE

The link I'm having problems with:

http://feeds.nytimes.com/click.phdo?i=8cd5af579b320b0bfd695ddcc344d96c

Alex
  • 34,581
  • 26
  • 91
  • 135
  • Redirects are on by default. Read the error message again. To determine if this is an error in the std-lib you would have to supply the URL, for checking. Note, that also the server could return different stuff based on the sent User-Agent. – sleeplessnerd Mar 29 '12 at 13:14
  • I've added the url. Looks like there are more than 5 redirects. But Java copes with them without any extra stuff like user agent. – Alex Mar 29 '12 at 13:15
  • Possible duplicate of [Python urllib2.urlopen returning 302 error even though page exists](http://stackoverflow.com/questions/4098702/python-urllib2-urlopen-returning-302-error-even-though-page-exists) – Krastanov Mar 16 '16 at 23:46

2 Answers2

26

Turns out you need to enable Cookies. The page redirects to itself after setting a cookie first. Because urllib2 does not handle cookies by default you have to do it yourself.

import urllib2
import urllib
from cookielib import CookieJar

cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
p = opener.open("http://feeds.nytimes.com/click.phdo?i=8cd5af579b320b0bfd695ddcc344d96c")

print p.read()
sleeplessnerd
  • 21,853
  • 1
  • 25
  • 29
9

Nothing wrong with @sleeplessnerd's solution, but this is very, very slightly more elegant:

import urllib2
url = "http://stackoverflow.com/questions/9926023/handling-rss-redirects-with-python-urllib2"
p = urllib2.build_opener(urllib2.HTTPCookieProcessor).open(url)

print p.read()

In fact, if you look at the inline documentation for the CookieJar() function, it more-or-less tells you to do things this way:

You may not need to know about this class: try urllib2.build_opener(HTTPCookieProcessor).open(url)

LondonRob
  • 73,083
  • 37
  • 144
  • 201
  • 1
    If implemented this way is the cookiejar shared between subsequent requests? (Does HTTPCookieProcessor use a singleton-cache for the cookies, or a new cookie jar each time?) – owenfi Dec 18 '14 at 21:43