Currently, I am using futures in Python in order to connect to (and analyse) multiple sites at the same time.
with concurrent.futures.ThreadPoolExecutor(max_workers = 8) as executor:
futures = {executor.submit(analyser.analyse, name, aggregator, past, current):
(name, aggregator) for name, aggregator in aggregators.iteritems()}
for future in concurrent.futures.as_completed(futures):
records += future.result()
However, the futures sometimes get "stuck" on certain webpages, at least that is my assumption. (Generally, the problem I am trying to solve is that when the script is launched from cronjob, the processes sometimes get stuck).
What I want to do, though, is to implement a "timeout" for certain futures, so if it exceeds its time limits, the future is submitted to the pool again.
with concurrent.futures.ThreadPoolExecutor(max_workers = 8) as executor:
futures = {executor.submit(analyser.analyse, name, aggregator, past, current):
(name, aggregator) for name, aggregator in aggregators.iteritems()}
for future in concurrent.futures.as_completed(futures):
try:
records += future.result(timeout = 30)
except concurrent.futures.TimeoutError:
if DEBUG:
print("Future took too long, retrying!")
Unfortunately, I cannot find a way to resubmit a future back to the pool, as executor only accepts "raw" objects, and not futures. Is there any Pythonic way of doing so?