5

I have the following function and dictionary comprehension:

def function(name, params):
    results = fits.open(name)
    <do something more to results>
    return results

dictionary = {name: function(name, params) for name in nameList}

and would like to parallelize this. Any simple way to do this?

In here I have seend that the multiprocessing module can be used, but could not understand how to make it pass my results to my dictionary.

NOTE: If possible, please give an answer that can be applied to any function that returns a result.

NOTE 2: the is mainly manipulate the fits file and assigning the results to a class

UPDATE

So here's what worked for me in the end (from @code_onkel answer):

def function(name, params):
    results = fits.open(name)
    <do something more to results>
    return results

def function_wrapper(args):
    return function(*args)

params = [...,...,..., etc]    

p = multiprocessing..Pool(processes=(max([2, mproc.cpu_count() // 10])))
args_generator = ((name, params) for name in names)

dictionary = dict(zip(names, p.map(function_wrapper, args_generator)))

using tqdm only worked partially since I could use my custom bar as tqdm reverts to a default bar with only the iterations.

Community
  • 1
  • 1
jorgehumberto
  • 1,047
  • 5
  • 15
  • 33
  • Seems like you could use `Pool.map` over `nameList`, return 2 tuples `(name, function(name, params))` and then concatenate the results and use it to create a dict ... – mgilson Jun 21 '16 at 17:27
  • 1
    The term is "dict comprehension", not "comprehensive dictionary". – user2357112 Jun 21 '16 at 18:15

1 Answers1

7

The dictionary comprehension itself can not be parallelized. Here is an example how to use the multiprocessing module with Python 2.7.

from __future__ import print_function
import time
import multiprocessing

params = [0.5]

def function(name, params):
    print('sleeping for', name)
    time.sleep(params[0])
    return time.time()

def function_wrapper(args):
    return function(*args)

names = list('onecharNAmEs')

p = multiprocessing.Pool(3)
args_generator = ((name, params) for name in names)
dictionary = dict(zip(names, p.map(function_wrapper, args_generator)))
print(dictionary)
p.close()

This works with any function, though the restrictions of the multiprocssing module apply. Most important, the classes passed as arguments and return values as well as the function to be parallelized itself have to be defined at the module level, otherwise the (de)serializer will not find them. The wrapper function is necessary since function() takes two arguments, but Pool.map() can only handle functions with one arguments (as the built-in map() function).

Using Python >3.3 it can be simplified by using the Pool as a context manager and the starmap() function.

from __future__ import print_function
import time
import multiprocessing

params = [0.5]

def function(name, params):
    print('sleeping for', name)
    time.sleep(params[0])
    return time.time()

names = list('onecharnamEs')

with multiprocessing.Pool(3) as p:
    args_generator = ((name, params) for name in names)
    dictionary = dict(zip(names, p.starmap(function, args_generator)))

print(dictionary)

This is a more readable version of the with block:

with multiprocessing.Pool(3) as p:
    args_generator = ((name, params) for name in names)
    results = p.starmap(function, args_generator)
    name_result_tuples = zip(names, results)
    dictionary = dict(name_result_tuples)

The Pool.map() function is for functions with a single argument, that's why the Pool.starmap() function was added in 3.3.

code_onkel
  • 2,759
  • 1
  • 16
  • 31
  • doesn't seem to work for me, getting an 'AttributeError: _ _exit_ _' on line 14 (beginnging of with) error when running your code, any idea why? Also, on a different machine getting 'AttributeError: 'Pool' object has no attribute "starmap"' – jorgehumberto Jun 21 '16 at 17:48
  • The use of the Pool as a context manager and the starmap() were added in version 3.3. I will edit the answer. – code_onkel Jun 21 '16 at 17:52
  • 1
    btw, any way tqdm can be implemented with map? I tried 'dictionary = dict(zip(names, tqdm(p.map(function_wrapper, args_generator))))', but no chance, – jorgehumberto Jun 21 '16 at 18:42
  • 1
    I think that is worth a separate question, but first have a look [here](http://stackoverflow.com/questions/32172763/progress-measuring-with-pythons-multiprocessing-pool-and-map-function/) – code_onkel Jun 21 '16 at 18:52