1

I have a call to an amazon page for web scraping. But even with timeout_connect and timeout_response provided, some of the threads get stuck for hours sometimes. Snippet is below:

timeout_connect = 10
timeout_respose_factor = 20

r = requests.get(url,
                 headers=headers,
                 timeout=(timeout_connect, timeout_response),
                 verify=False)

It does timeout for some requests when it reaches this threshold and is not able to connect. But for a very few requests, some threads get stuck as it takes hours for requests to respond.

Am I missing something here? And if this might happen, is there a way I can keep track of this requests call in that thread and if its more than a particular time, I will just move forward, instead of waiting for requests?

Tushar Seth
  • 563
  • 7
  • 15
  • This might be a bug (maybe try the latest version) or the server does not like you and sends you one byte every 10s. I'd try to sniff such connection to check. Check [this answer](https://stackoverflow.com/a/22096841/15862) for a work-around. – Tometzky Sep 21 '21 at 14:03
  • @Tometzky, any leads on how to check such a connection ? – Tushar Seth Sep 22 '21 at 19:03

0 Answers0