0

I was wondering if there is anyway to run several threads and assign the results of different thread to specific keys in dict. Something like this:

from joblib import Parallel, delayed
from math import sqrt
dict_of_sqrt = {}
i = {'a':1,'b':2,'c':3,'e':4}
dict_of_sqrt[k] = Parallel(n_jobs=2)(delayed(sqrt)(v**2) for k, v in i.items())

The result should be the dictionary with the same keys and assigned new values calculated in parallel:

dict_of_sqrt = {'a':1, 'b':1.41, 'c'=1.73, 'e'=2}

It suppose to be safe, because I am writing to different keys (without overlapping). However, I have not found an example.

Igor Markelov
  • 798
  • 2
  • 8
  • 16

3 Answers3

0

If you are doing an IO operation then you can Threads but if you are done a CPU intense operation you must take GIL into account so I will process instead.

in case you are using multiprocessing library I think that Pool is the best object for you as you want to control the number of executing threads. In order to execute the process you can use either map or apply_async with a callback function. You can read more about map vs apply_async here: Python multiprocessing.Pool: when to use apply, apply_async or map?

I like to use the Queue object to return data to the parent process as it multiprocessing/thread safe, but it does have the over head of going over the Queue and processing the results.

Here is a quick example of a basic usage.

import multiprocessing
from Queue import Empty
from math import sqrt
import string



def my_sqrt(*args):
    k, v, q = args[0]
    q.put({k: sqrt(v)})

def main():

    p = multiprocessing.Pool(2)
    m = multiprocessing.Manager()
    q = m.Queue()

    # This is just to generate data. 
    l = [(k, (v + 1)**2, q) for v, k in enumerate(string.lowercase)]
    p.map(my_sqrt, l)

    # Go over the Q
    d = dict()
    try:
        while 1:
            m = q.get_nowait()
            d.update(m)
    except Empty:
        pass

    print d

if __name__ == '__main__':
    main()
Community
  • 1
  • 1
asafm
  • 911
  • 6
  • 17
0

Updated

from multiprocessing import Pool
from time import sleep
from random import choice
import math


# function that may take up to 5 seconds
def myroot(t):
    sleep(choice(range(5)))
    return {t[0]:math.sqrt(t[1])}

if __name__ == '__main__':
    # dictionary keys -> n
    # value should be n**2
    d = {'a':1,'b':2,'c':3,'e':4}
    dt = [(i,d[i]) for i in d]
    # spawn 3 processes
    pool = Pool(3)
    # iterate all keys, feed them in sqr fuc
    results = pool.map(myroot, dt)
    # update object d, with result from each process
    [d.update(i) for i in results]
    print(d)
taesu
  • 4,482
  • 4
  • 23
  • 41
0

You can have the values be calculated in parallel and returned as tuple pairs, of the new key and values. In the calling process you then only need to convert it to a dictionary.

from math import sqrt
from multiprocessing import Pool

i = {'a':1,'b':2,'c':3,'e':4}

# pass me to `starmap`
def sqrtval(k, v):
    return k, sqrt(v)

# pass me to `map`
def sqrtval_py2(kv):
    k, v = kv
    return k, sqrt(v)

tup = Pool().starmap(sqrtval, i.items())
# tup = Pool().map(sqrtval_py2, i.items())

print(tup)
print(dict(tup))

Python 2 does not have a starmap method for a multiprocessing.Pool, so to combat this the sqrtval would have to accept a tuple of key/value and use that within the mapping function. The python 2 alternative is provided also.

Paul Rooney
  • 20,879
  • 9
  • 40
  • 61