Python: how to parallelize a loop with dictionary

Question

EDITED: I have a code which looks like:

__author__ = 'feynman'

cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def MC_Surface(volume, mc_vol):

    Perm_area = {
        "00000000": 0.000000,
        "11111111": 0.000000,
         ...
         ...
        "11100010": 1.515500,
        "00011101": 1.515500
    }

    cdef int j, i, k
    for k in range(volume.shape[2] - 1):
        for j in range(volume.shape[1] - 1):
            for i in range(volume.shape[0] - 1):

                pattern = '%i%i%i%i%i%i%i%i' % (
                    volume[i, j, k],
                    volume[i, j + 1, k],
                    volume[i + 1, j, k],
                    volume[i + 1, j + 1, k],
                    volume[i, j, k + 1],
                    volume[i, j + 1, k + 1],
                    volume[i + 1, j, k + 1],
                    volume[i + 1, j + 1, k + 1])

                mc_vol[i, j, k] = Perm_area[pattern]

    return mc_vol

In the hope to speed up, it was modified to:

    {
      ...
      ...
      "11100010": 1.515500,
      "00011101": 1.515500
    }
    keys = np.array(Perm_area.keys())
    values = np.array(Perm_area.values())

    starttime = time.time()
    tmp_vol = GetPattern(volume)
    print 'time to populate the key array: ', time.time() - starttime

    cdef int i
    starttime=time.time()
    for i, this_key in enumerate(keys):
        mc_vol[tmp_vol == this_key] = values[i]

    print 'time for the loop: ', time.time() -starttime
    return mc_vol

def GetPattern(volume):
    a = (volume.astype(np.int)).astype(np.str)
    output = a.copy()  # Central voxel
    output[:, :-1, :] = np.char.add(output[:, :-1, :], a[:, 1:, :])  # East
    output[:-1, :, :] = np.char.add(output[:-1, :, :], a[1:, :, :])  # South
    output[:-1, :-1, :] = np.char.add(output[:-1, :-1, :], a[1:, 1:, :])  # SouthEast
    output[:, :, :-1] = np.char.add(output[:, :, :-1], a[:, :, 1:])  # Down
    output[:, :-1, :-1] = np.char.add(output[:, :-1, :-1], a[:, 1:, 1:])  # DownEast
    output[:-1, :, :-1] = np.char.add(output[:-1, :, :-1], a[1:, :, 1:])  # DownSouth
    output[:-1, :-1, :-1] = np.char.add(output[:-1, :-1, :-1], a[1:, 1:, 1:])  # DownSouthEast
    output = output[:-1, :-1, :-1]
    del a
    return output

This takes longer for a 3D array of size say 500^3. Here, tmp_vol 3D array of strings. For example: if say tmp_vol[0,0,0] = "00000000" then mc_vol[0,0,0] = 0.00000. Alternatively, I can get rid of mc_vol and write if tmp_vol[0,0,0] = "00000000" then tmp_vol[0,0,0] = 0.00000.

Here, The for-loop takes a lot of time, I see that only one CPU is used. I tried to map them in parallel using map and lambda but ran into errors. I am really new to python so any hints will be great.

Could explain better what your code actually does? What is tmp_vol? — leeladam, May 05 '15 at 18:07
Thank you for your reply! tmp_vol has the keys like in the dictionary Perm_area = { "00000000": 0.000000, ... ... ... "11100010": 1.515500, "00011101": 1.515500 }. After tmp_vol is populated with the keys, the keys are to be replaced by the corresponding values in dictionary Perm_area at their respective position in 3D array. For example: if say tmp_vol[0,0,0] = "00000000" then mc_vol[0,0,0] = 0.00000. Alternatively, if tmp_vol[0,0,0] = "00000000" then tmp_vol[0,0,0] = 0.00000 is also fine. — pyculearn, May 06 '15 at 10:10

score 1 · Answer 1 · answered May 06 '15 at 11:40

1

Since I don't quite understand your code and you said "any hints will be great", I'll give you some general suggestions. Basically you want to speed up a for loop

for i, this_key in enumerate(keys):

What you could do is split the keys array into several parts, something like this:

length = len(keys)
part1 = keys[:length/3]
part2 = keys[length/3: 2*length/3]
part3 = keys[2*length/3:]

Then deal with each part in a subprocess:

from concurrent.futures import ProcessPoolExecutor

def do_work(keys):
    for i, this_key in enumerate(keys): 
        mc_vol[tmp_vol == this_key] = values[i]

with ProcessPoolExecutor(max_workers=3) as e:
    e.submit(do_work, part1)
    e.submit(do_work, part2)
    e.submit(do_work, part3)

return mc_vol

And that's it.

answered May 06 '15 at 11:40

laike9m

18,344
20
107
140

Thank you. I am trying it out right now. I included the full code in my question. – pyculearn May 06 '15 at 13:42
I tried to separate in parts as you suggested. The memory usage increases and eventually program runs out of memory. – pyculearn May 07 '15 at 14:39
Then your program may have problems. I don't have much time to fully understand your program, but using multiprocessing would not cause such error if your program is correct. – laike9m May 08 '15 at 01:54

score 0 · Answer 2 · edited May 23 '17 at 12:06

0

First, dictionary lookup takes closely constant time, although array equality check is O(N). Thus you should loop over your array and not your dictionary.

Second, you could save much time with nested list comprehension (change python loop to C loop).

mc_vol = [[Perm_area[key] for key in row] for row in tmp_vol]

This gives you a list of lists, so you can completely avoid numpy in this case. Although if you need a numpy array, just convert:

mc_vol = np.array(mc_vol)

edited May 23 '17 at 12:06

Community

1
1

answered May 06 '15 at 11:42

leeladam

1,748
10
15

1

Vectorized version takes long time as well and eventually runs out memory. For-Loop version seems ok with memory but is slow, however not much slower than vectorized version. – pyculearn May 07 '15 at 20:35

Python: how to parallelize a loop with dictionary

2 Answers2

Linked