I wrote the following helper function to run arbitrary functions in parallel.
import multiprocessing
def runParallel(fns=[], args=[]):
print('Starting multiprocessing with %i cores' % (multiprocessing.cpu_count() - 1))
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count() - 1)
for fn, arg in zip(fns, args):
pool.apply_async(fn, (arg,))
pool.close()
pool.join()
I call the function with an itertools.repeat call of a function and a list of filenames
runParallel(itertools.repeat(self.processFile), fileNamesAndPaths)
processFile is a classmethod with the signature
def processFile(self, filename):
and starts with a 'print'-statement which is never executed. The programm just ends after the output "starting multiprocessing with 3 cores".
using Process from multiprocessing works in general but it floods my CPU with an amount of processes it can't handle and freezes eventually but at least the processFile function is called
from multiprocessing import Process
def runParallel(fns=[], args=[]):
proc = []
for fn, arg in zip(fns, args):
p = Process(target=fn, args=(arg,))
p.start()
proc.append(p)
for p in proc:
p.join()
This is why I wanted to use pool since from my understanding it would handle the amount of processes at any given time.
If it is helpful I run this with 2.7.10 on a 64 bit Windows machine.