2

I would like to parallelize a python script. I've create a definition:

def dummy(list1,list2):
  do usefull calculations ...

The list1 and list2 contain a list of files name that I should read and then make calculation with them. The files are independent. The list1 and 2 contain the same number of argument.

Let us assume I have 2 cpus (I want to impose the number of cpus to be used). I would like the first cpu to call the definition with a list that contain only the first half of list1 and list2 and at the same time the second cpu should call the same def dummy with the second half of list1 and list2.

Something like:

import multiprocessing
nb_cpus = 2
pool = multiprocessing.Pool(processes=nb_cpus)
for ii in nb_cpus:
  list_half1 = list1[0:max/nb_cpus]
  list_half2 = list2[0:max/nb_cpus]
  result[ii] = pool.map(dummy,list_half1,list_half2)

The problem is that the pool.map can only work if the def has 1 argument and that I cannot loop over cpus.

Thank you for any help on that problem !

PS: It is not possible for me to concatenate the two arguments into one because in the real case I'm having I'm passing much more arguments.

sponce
  • 1,279
  • 2
  • 11
  • 17

1 Answers1

8

First, you don't need to split your lists yourself, multiprocessing.Pool will do it for you.

To pass many arguments to your function as a single argument, you only need to zip the lists together, like this:

import multiprocessing

def myFunction(arguments):
    item1, item2 = arguments
    ...

nb_cpus = 2
pool = multiprocessing.Pool(processes=nb_cpus)
results = pool.map(myFunction, zip(list1, list2))
Charles Brunet
  • 21,797
  • 24
  • 83
  • 124