0

I have a panda dataframe with many rows, I am using multiprocessing to process grouped tables from this dataframe concurrently. It works fine but I have a problem passing in a second parameter, I have tried to pass both arguments as a Tuple but it doesn't work. My code is as follows:

I want to also pass in the parameter "col" to the function "process_table"

for col in cols:
    tables = df.groupby('test')
    p = Pool()
    lines = p.map(process_table, table)
    p.close()
    p.join()

def process_table(t):
    # Bunch of processing to create a line for matplotlib
    return line
Darkonaut
  • 20,186
  • 7
  • 54
  • 65
Supez38
  • 329
  • 1
  • 3
  • 16
  • Use `starmap` (Python 3.3+) like shown [here](https://stackoverflow.com/a/53805285/9059420). You don't need do rebuild your Pool on every iteration, create it once outside the loop and reuse it. – Darkonaut Feb 11 '19 at 19:15
  • @Darkonaut Is there an alternative for Python 2.7? – Supez38 Feb 11 '19 at 19:17
  • Yes, see [here](https://stackoverflow.com/a/52671399/9059420) – Darkonaut Feb 11 '19 at 19:19
  • @Darkonaut Thanks, I worked that one out but I am getting an error for "p" which is the pool object. After debugging for a little, it looks like the second parameter "col" which in this case is a string "pnl" comes into the helper function unwrapped as 'p' in the first iteration and then 'n', then 'l' and then I get the error – Supez38 Feb 11 '19 at 19:39
  • Look at my first linked answer and how to make the arguments the same length. – Darkonaut Feb 11 '19 at 19:43
  • 1
    @Darkonaut Works great, thanks for the help! – Supez38 Feb 11 '19 at 19:52
  • Possible duplicate of [Python 2.7: How to compensate for missing pool.starmap?](https://stackoverflow.com/questions/52651506/python-2-7-how-to-compensate-for-missing-pool-starmap) – Darkonaut Feb 12 '19 at 15:30

1 Answers1

0

You could do this, it takes an iterable and expands it into individual arguments :

def expand(x):
    return process_table(*x)

p.map(expand, table)

You might be tempted to do this:

p.map(lambda x: process_table(*x), table) # DOES NOT WORK

But it won't work because lambdas are unpickleable (if you don't know what this means, trust me).

Benoît P
  • 3,179
  • 13
  • 31