Script fails randomly due to urllib.error.HTTPError: HTTP Error 302

Question

I have a strange problem i've been trying to 'google-out' for several hours.
I've tried also solutions from similar topics on stack but still wiht no positive result:

How do I set cookies using Python urlopen?
Handling rss redirects with Python/urllib2

So the case is that i want to download whole set of articles form some webpage. Its sub-links with proper content differ with just one number, so I loop for whole range ( 1 to 400 000 ) and write html's to files. Whats importatnt here is this webpage need the cookies to be re-send in order to get to proper url, and after a lecture of How to use Python to login to a webpage and retrieve cookies for later usage? i have this done.

But some times my script returns error:

response = meth(req, response)
File "/usr/lib/python3.1/urllib/request.py", line 468, in http_response
'http', request, response, code, msg, hdrs)
....
File "/usr/lib/python3.1/urllib/request.py", line 553, in http_error_302 self.inf_msg + msg, headers, fp)
urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found

This problem is hard to reproduce because script generally works fine but it happens randomly after a few thousands of 'for loops'.

Here is curl ouptut from server:
$ curl -I "http://my.url/" HTTP/1.1 200 OK Date: Wed, 17 Oct 2012 10:14:13 GMT Server: Apache/2.2.15 (Oracle) X-Powered-By: PHP/5.3.3 Set-Cookie: Kuuxk=ae7s3isu2cEshhijte4nb1clk5; path=/ Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache Vary: Accept-Encoding Connection: close Content-Type: text/html; charset=UTF-8

Some folks suggested to use mechanize or try to catch exception but i have no clue how to do this, others said that error is caused by wrong cookie handling but i tried also to get and send cookies 'manually' using urllib2 and add_header('cookie', cookie) with similar result. I wonder if my for loop and mabey to short sleep might cause script to fail some times..
Anwyay - any help appreciated.

edit:
In case this might work - how to catch the exception and try ignore it ?

edit:

Solved by simply ignoring this error. No everything goes fine.
I used

    try:  
        #here open url  
    except any_HTTPError:  
        pass

On each time i use url.open instruction.

TO BE CLOSED.

Then you can answer your own question and accept your answer. — , Oct 19 '12 at 17:33

score 0 · Answer 1 · answered Aug 17 '16 at 05:12

Let me suggest another solution: HTTP status code 302 means Found redirection (See: https://en.wikipedia.org/wiki/HTTP_302).

For example: HTTP/1.1 302 Found Location: http://www.iana.org/domains/example/

You can grab the Location header and try fetching this url.

There are 8 redirection status codes (301-308). You can for Location header if 301 <= status code <= 308.

Script fails randomly due to urllib.error.HTTPError: HTTP Error 302

1 Answers1