2

I am trying to open a list of urls, and I set a timeout value to exclude the urls which do not open. However, when I come across the following url, it gets stuck and does not time out. The site opens normally from the browser, so where can the problem be?

url='http://www.gizmodo.it/2008/03/12/lo_scanner_di_impronte_digitali_che_distingue_un_dito_vivo_da_unomorto.html'

opener=browser.open(url,timeout=2)
hmghaly
  • 1,411
  • 3
  • 29
  • 47

1 Answers1

4

The page (referenced by the given url) responds with refresh: 185 header. This cause HTTPRefreshProcessor to sleep for 185 seconds; refresh same page; sleep .... (forever).

You can turn off HTTPRefreshProcessor using set_handle_refresh method as follow:

browser.set_handle_refresh(False) # <-----
browser.open(url, timeout=2.0)
falsetru
  • 357,413
  • 63
  • 732
  • 636