2

I am trying to efficiently parallelize a python loop into n threads. And I am getting a bit confused as to what the best method would be. Additional issues are that each thread would need to write to a dictionary (never the same spot though), and that each thread would have to perform 24/n iterations of the loop (though I am pretty sure most the pyhon libraries will take care of that issue for me).

The code (simplified):

n=<number of threads input by user>
mySets=[ str(x) for x in range(1,25) ]
myDict={}

// Start of parallelization
for set in mySets:

    //Performs actions on the set
    //Calls external c++ code on the set and gets a result back
    //processes the result
    myDict[set]=result

// End parallelization

// Process the results to output

I'm in a unix environment, but optimally it wouldn't have a problem with Windows or MAC. The rest of my code is portable, I don't really want this to stop it.

I saw this thread: Parallelize a loop in python 2.4 but I don't think fork is what I want since I'd like the user to specify the number of nodes available.

I also looked at the multiprocessing library, which I am pretty sure is what I want, but it seems like everyone put their code into a function - which I'd like to avoid... its a LOT of code and it would be messy.

I also saw joblib, but I am unclear what the difference is between it and the multiprocessing library. And what the benefit of one vs the other would be.

Thanks for any help!

bnp0005
  • 199
  • 1
  • 3
  • 11

1 Answers1

1

You can use mutliprocessing.pool.Pool.

Here is some pseudo code:

from multiprocessing.pool import Pool


def do_something(n, sets):
    out = dict()

    with Pool(processes=n) as pool:
        results = pool.map(cpp_computation_function, sets)
        for set, result in zip(sets, results):
            out[set] = result

    return out
amirouche
  • 7,682
  • 6
  • 40
  • 94