I have been dabbling with Python's multiprocessing library and although it provides an incredibly easy to use API, it's documentation is not always very clear. In particular, the argument 'maxtasksperchild' passed to an instance of the Pool class I find very confusing.
The following comes directly from Python's documentation (3.7.2):
maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.
The above raises more questions for me than it answers. Is it bad for a worker process to live as long as the pool? What makes a worker process 'fresh' and when is that desired? In general, when should you set the value for maxtasksperchild explicitly instead of letting it default to 'None' and what are considered best practices in order to maximize processing speed?
From @Darkonaut's amazing answer on chunksize I now understand what chunksize does and represents. Since supplying a value for chunksize impacts the number of 'tasks', I was wondering if there are any considerations that should be made regarding their dependence to ensure maximum performance?
Thanks!