8

I am currently doing a merge over a set of variables that I'd like to parallelize. My code looks something like this:

mergelist = [
  ('leftfile1', 'rightfile1', 'leftvarname1', 'outputname1'),
  ('leftfile1', 'rightfile1', 'leftvarname2', 'outputname2')
  ('leftfile2', 'rightfile2', 'leftvarname3', 'outputname3')
]

def merger(leftfile,rightfile,leftvarname,outvarname):
   do_the_merge

for m in mergelist:
     merger(*m)

Ordinarily, to speed up long loops, I would replace the for m in mergelist with something like....

from multiprocessing import Pool

p = Pool(8)
p.map(merger(m), mergelist)
p.close()

But since I'm using the star to unpack the tuple, it's not clear to me how to map this correctly. How do I get the *m?

Mittenchops
  • 18,633
  • 33
  • 128
  • 246
  • 3
    The existence of `itertools.starmap` seems to imply that a standard `map` can't easily be coerced to do this. You can, of course, create a wrapper function that forwards to the actual implementation with the sole argument unpacked. –  Feb 26 '14 at 20:47
  • How are you defining `merger`? You can't specify string literals as formal parameter names in a function definition. – chepner Feb 26 '14 at 20:51
  • Thanks chepner, I corrected that; careless pseudocoding. I need a better pseudocode interpreter. ;) – Mittenchops Feb 26 '14 at 20:59

3 Answers3

4

Use lambda:

with Pool(8) as p:
    p.map(lambda m:merger(*m), mergelist)
ndpu
  • 22,225
  • 6
  • 54
  • 69
  • same idea, but simpler & less typing. a much better solution. – Corley Brigman Feb 26 '14 at 20:58
  • I slightly redact my love of lambda. I'm pretty sure this can't be distributed because lambdas can't be pickled: PicklingError: Can't pickle : attribute lookup __builtin__.function failed – Mittenchops Feb 26 '14 at 22:59
1

The simplest solution IMHO is to change the merger function, or add a wrapper:

def merger(leftfile,rightfile,'leftvarname','outvarname'):
    do_the_merge

def merger_wrapper(wrapper_tuple):
    merger(*wrapper_tuple)

p.map(merger_wrapper, mergelist)

I see @delnan actually also put this solution in the comments.

To add a little value to this :) You could also wrap it like this:

from functools import partial
def unpack_wrapper(f):
    def unpack(arg):
        return f(*arg)
    return unpack

This should let you simplify this to

p.map(unpack_wrapper(merger), mergelist)
Corley Brigman
  • 11,633
  • 5
  • 33
  • 40
  • This seems to produce errors about being unable to pickle the function. – Mittenchops Feb 26 '14 at 23:07
  • interesting... while investigating, i found this: http://stackoverflow.com/questions/11287455/how-do-i-avoid-this-pickling-error-and-what-is-the-best-way-to-parallelize-this and this: http://stackoverflow.com/questions/19310536/adding-state-to-a-function-which-gets-called-via-pool-map-how-to-avoid-pickli . maybe they will help? – Corley Brigman Feb 26 '14 at 23:19
1

You can unpack the tuple in your merge function:

def merger(args):
    if len(args) != 4:
         # error
    leftfile,rightfile,leftvarname,outvarname = args
    do_the_merge

The other option is to unpack in the argument list:

def merger( (leftfile,rightfile,leftvarname,outvarname) ):
    do_the_merge

Edit: to address the OP concerns:

def merger((l,r,v,o)):
    return l+r

for m in mergelist:
         print merger(m)

returns

leftfile1rightfile1
leftfile1rightfile1
leftfile2rightfile2
Daniel
  • 26,899
  • 12
  • 60
  • 88