I have a strange problem i've been trying to 'google-out' for several hours.
I've tried also solutions from similar topics on stack but still wiht no positive result:
How do I set cookies using Python urlopen?
Handling rss redirects with Python/urllib2
So the case is that i want to download whole set of articles form some webpage. Its sub-links with proper content differ with just one number, so I loop for whole range ( 1 to 400 000 ) and write html's to files. Whats importatnt here is this webpage need the cookies to be re-send in order to get to proper url, and after a lecture of How to use Python to login to a webpage and retrieve cookies for later usage? i have this done.
But some times my script returns error:
response = meth(req, response)
File "/usr/lib/python3.1/urllib/request.py", line 468, in http_response
'http', request, response, code, msg, hdrs)
....
File "/usr/lib/python3.1/urllib/request.py", line 553, in http_error_302 self.inf_msg + msg, headers, fp)
urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found
This problem is hard to reproduce because script generally works fine but it happens randomly after a few thousands of 'for loops'.
Here is curl ouptut from server:
$ curl -I "http://my.url/"
HTTP/1.1 200 OK
Date: Wed, 17 Oct 2012 10:14:13 GMT
Server: Apache/2.2.15 (Oracle)
X-Powered-By: PHP/5.3.3
Set-Cookie: Kuuxk=ae7s3isu2cEshhijte4nb1clk5; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Vary: Accept-Encoding
Connection: close
Content-Type: text/html; charset=UTF-8
Some folks suggested to use mechanize or try to catch exception but i have no clue how to do this, others said that error is caused by wrong cookie handling but i tried also to get and send cookies 'manually' using urllib2 and add_header('cookie', cookie)
with similar result.
I wonder if my for
loop and mabey to short sleep might cause script to fail some times..
Anwyay - any help appreciated.
edit:
In case this might work - how to catch the exception and try ignore it ?
edit:
Solved by simply ignoring this error. No everything goes fine.
I used
try: #here open url except any_HTTPError: pass
On each time i use url.open instruction.
TO BE CLOSED.