I wrote a crawler in python to download some web pages from a website based on some given urls. I noticed that occasionally my program hang at this line "conn.getresponse()". No exception were thrown and the program simply waited there for ever.
conn = httplib.HTTPConnection(component.netloc)
conn.request("GET", component.path + "?" + component.query)
resp = conn.getresponse() #hang here
I read the api doc and it says that (to add a timeout):
conn = httplib.HTTPConnection(component.netloc, timeout=10)
However, it does not allow me to "retry" the connection. What is the best practice to retry the crawling after a timeout?
For example, I'm thinking of the following solution:
trials = 3
while trials > 0:
try:
... code here ...
except:
trials -= 1
Am I in the right direction?