Python multiprocessing, spinning off processes inside a for loop

Question

I have 800 files with some data to process, it's enough that I want to use multiprocessing to do this but I think I'm not doing it correctly.

Inside my main() function I'm trying to spin off 1 process for each file that needs processing (I'm guessing that this is not a good idea because my computer won't be able to handle 800 concurrent processes but I haven't gotten that far yet).

Here is my main():

manager = multiprocessing.Manager()
arr = manager.list()

def main():

    count = 0

    with open("loc.csv") as loc_file:

        locs = csv.reader(loc_file, delimiter=',')

        for loc in locs:

            if count != 0:

                process = multiprocessing.Process(target=sort_run, args=[loc])

                process.start()
                process.join()

            count += 1

And then my code that is the target of the process:

def sort_run(loc):

    start_time = time.time()

    sorted_list = sort_splits.sort_splits(loc[0])

    value = process_reads.count_coverage(sorted_list, loc[0])

    arr.append([loc[0], value])

I'm using the multiprocessing.Manager() so that my processes can access the arr list properly. I received the error:

An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

I think what's happening is the loop is too fast to spin off the processes correctly. Or maybe each process has to have a specific variable not just "process = ..."

I would not recommend spinning up multiple processes for this task (800 of them) as it would most likely eat away at your computer CPU and subsequently slow things down. Instead, I would look into an asynchronous solutions with [`asyncio`](https://docs.python.org/3/library/asyncio.html) and [`aiofiles`](https://github.com/Tinche/aiofiles). Both modules linked will allow you to process all of these files in the same thread, but give you the benefits of being extremely fast and saving your computer a lot of resource. — felipe, Nov 02 '19 at 19:47
Regarding your question directly, I would recommend checking out [this](https://stackoverflow.com/questions/24374288/where-to-put-freeze-support-in-a-python-script) Stackoverflow answer, and this [Github Issue](https://github.com/dask/distributed/issues/516#issuecomment-306468605) for more context on why the error is happening. — felipe, Nov 02 '19 at 19:51

Python multiprocessing, spinning off processes inside a for loop

0 Answers0