0

I have a functions that calculates millions lines of data and I what to make it faster with multiprocessing.
Before now i have used only Pool with map like that:

from multiprocessing.dummy import Pool

pool = Pool(20)
pool.map(my_function, data_list)

But my function takes two parameters: list of companies and dictionary, with csv files as values in it.
So my question is, How can I use multiprocessing with my function?

Taylan Aydinli
  • 4,333
  • 15
  • 39
  • 33
Michael
  • 15,386
  • 36
  • 94
  • 143
  • 3
    See http://stackoverflow.com/q/5442910/553404 and http://stackoverflow.com/q/4463275/553404 – YXD Dec 09 '13 at 15:03

2 Answers2

2

You can use a list to include all the arugments, then the function just takes the only one argument and parse the argument in the function.

flx90
  • 21
  • 2
1

Have a look at the definition of Pool.map():

def map(self, func, iterable, chunksize=None):

From the view of semantics, the method Pool.map() just applies the function to an iterable, namely, the function must have and only have one variable argument which gets a value from the iterable. So the function must have only one argument or have a variable argument with other arguments having fixed values.

So there are a few solutions:

  • Use functools.partial() function to fix other arguments in the function, which is described in Python multiprocessing pool.map for multiple arguments
  • Use a list or tuple as the unique argument of the function, which encapsulates the arguments the function needs
  • Don't use Pool.map(). Use multiprocessing.Process() to generate each process and then add them to the pool
Community
  • 1
  • 1
flyer
  • 9,280
  • 11
  • 46
  • 62