I have a parallelized task that reads stuff from multiple files, and writes it out the information to several files.
The idiom I am currently using to parallelize stuff:
listOfProcesses = []
for fileToBeRead in listOfFilesToBeRead:
process = multiprocessing.Process(target = somethingThatReadsFromAFileAndWritesSomeStuffOut, args = (fileToBeRead))
process.start()
listOfProcesses.append(process)
for process in listOfProcesses:
process.join()
It is worth noting that somethingThatReadsFromAFileAndWritesSomeStuffOut
might itself parallelize tasks (it may have to read from other files, etc. etc.).
Now, as you can see, the number of processes being created doesn't depend upon the number of cores I have on my computer, or anything else, except for how many tasks need to be completed. If ten tasks need to be run, create ten processes, and so on.
Is this the best way to create tasks? Should I instead think about how many cores my processor has, etc.?