I am using ThreadPoolExecutor
from python's concurrent.futures
to parallelize scraping and writing results to a database. When doing so, I realized that I do not get any information if one of the threads fails. How can I properly be aware of which threads fail and why (so with the 'normal' traceback)? Below is a minimal working example.
import logging
logging.basicConfig(format='%(asctime)s %(message)s',
datefmt='%y-%m-%d %H:%M:%S', level=logging.INFO)
from concurrent.futures import ThreadPoolExecutor
def worker_bee(seed):
# sido is not defined intentionally to break the code
result = seed + sido
return result
# uncomment next line, and you will get the usual traceback
# worker_bee(1)
# ThreadPoolExecutor will not provide any traceback
logging.info('submitting all jobs to the queue')
with ThreadPoolExecutor(max_workers=4) as executor:
for seed in range(0,10):
executor.submit(worker_bee, seed)
logging.info(f'submitted, waiting for threads to finish')
If I import logging inside worker_bee()
and direct the messages to the root logger, I can see those in the final log. But I will only be able to see the log messages that I define, not the traceback of where the code actually fails.