python mechanize doesn't time out and gets stuck in opening a url

Question

I am trying to open a list of urls, and I set a timeout value to exclude the urls which do not open. However, when I come across the following url, it gets stuck and does not time out. The site opens normally from the browser, so where can the problem be?

url='http://www.gizmodo.it/2008/03/12/lo_scanner_di_impronte_digitali_che_distingue_un_dito_vivo_da_unomorto.html'

opener=browser.open(url,timeout=2)

if you're on unix, you can [use this](http://stackoverflow.com/a/133384/1595865) to get a stack trace — loopbackbee, Dec 04 '13 at 13:31

falsetru · Accepted Answer · 2013-12-04T14:02:33.267

4

The page (referenced by the given url) responds with refresh: 185 header. This cause HTTPRefreshProcessor to sleep for 185 seconds; refresh same page; sleep .... (forever).

You can turn off HTTPRefreshProcessor using set_handle_refresh method as follow:

browser.set_handle_refresh(False) # <-----
browser.open(url, timeout=2.0)

edited Dec 04 '13 at 14:02

answered Dec 04 '13 at 13:38

falsetru

357,413
63
732
636

perfect! but if I use that, I won't be able to handle pages that does refresh or redirect? – hmghaly Dec 04 '13 at 13:50
1

@hmghaly, Use another browser instance for other pages if that matter. – falsetru Dec 04 '13 at 13:59

python mechanize doesn't time out and gets stuck in opening a url

1 Answers1