I have 800 files with some data to process, it's enough that I want to use multiprocessing to do this but I think I'm not doing it correctly.
Inside my main() function I'm trying to spin off 1 process for each file that needs processing (I'm guessing that this is not a good idea because my computer won't be able to handle 800 concurrent processes but I haven't gotten that far yet).
Here is my main():
manager = multiprocessing.Manager()
arr = manager.list()
def main():
count = 0
with open("loc.csv") as loc_file:
locs = csv.reader(loc_file, delimiter=',')
for loc in locs:
if count != 0:
process = multiprocessing.Process(target=sort_run, args=[loc])
process.start()
process.join()
count += 1
And then my code that is the target of the process:
def sort_run(loc):
start_time = time.time()
sorted_list = sort_splits.sort_splits(loc[0])
value = process_reads.count_coverage(sorted_list, loc[0])
arr.append([loc[0], value])
I'm using the multiprocessing.Manager() so that my processes can access the arr list properly. I received the error:
An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
I think what's happening is the loop is too fast to spin off the processes correctly. Or maybe each process has to have a specific variable not just "process = ..."