I'm trying to use Python's multiprocessing module to run an analysis on multiple samples in parallel. I'm using pool.map_async
to spawn the function (called crispr_analysis
) on a tuple of tuples for arguments (zipped_args
). Each tuple within zipped_args
is not empty, as that can cause multiprocessing to hang. Upon completion of the pool, it hangs and fails to move on to the rest of the script. I know that crispr_analysis
finishes as it creates output files (generated with with
statements, so they're closing properly); I can browse said files and they are complete. I never see the debug message for sorting the results, and the program never terminates.
try:
# Use map_async and get with a large timeout
# to allow for KeyboardInterrupts to be caught
# and handled with the try/except
timeout = max((9999, 600 * len(fastq_list)))
logging.debug("Setting timeout to %s seconds", timeout)
res = pool.map_async(crispr_analysis, zipped_args) # type: multiprocessing.pool.MapResult
pool.close()
results = res.get(timeout)
except (KeyboardInterrupt, ExitPool) as error: # Handle ctrl+c or custom ExitPool
pool.terminate()
# pool.join()
if isinstance(error, KeyboardInterrupt): # ctrl+c
sys.exit('\nkilled')
elif isinstance(error, ExitPool): # My way of handling SystemExits
sys.exit(error.msg)
else: # Shouldn't happen, but you know...
raise
except:
pool.terminate(); pool.join()
raise
else:
pool.join()
try:
logging.debug("Sorting results into alignments and summaries")
sort_start = time.time() # type: float
alignments, summaries = zip(*results) # type: Tuple[Tuple[alignment.Alignment]], Tuple[Dict[str, Any]]
logging.debug("Sorting results took %s seconds", round(time.time() - sort_start, 3))
except ExitPool as error: # Handle ExitPool calls for single-threaded map
sys.exit(error.msg)
Does anyone have any idea why multiprocessing is hanging and how I can fix it?
Extra information:
- I'm using Python 2.7.8 on CentOS 7.3.1611; platform and Python version are not changeable
crispr_analysis
returns a tuple and dictionary, that are each either empty or have some length based on the inputs- I have tried omitting the
pool.join()
statements, to no avail ExitPool
is an error that I throw to stop the entire pool in place ofSystemExit
s; multiprocessing normally swallowsSystemExit
s, but I want them to bubble up- This entire snippet is called from within a function (called
main
) This analysis program is called from an easy-install entry script, where the entry point is the
main
function that starts the multiprocessing pool#!/usr/bin/python # EASY-INSTALL-ENTRY-SCRIPT: 'EdiTyper==1.0.0','console_scripts','EdiTyper' __requires__ = 'EdiTyper==1.0.0' import sys from pkg_resources import load_entry_point if __name__ == '__main__': sys.exit( load_entry_point('EdiTyper==1.0.0', 'console_scripts', 'EdiTyper')() )