Nested for loops using multiprocessing

Question

I have a quick question regarding multiprocessing in python.

I am conducting a rather large grid search over three parameters and the computation is taking ~14 hours to complete. I would like to shrink this run time down by using multiprocessing.

A very simplified example of my code is here:

import numpy as np
import pickle
import time

a_range = np.arange(14, 18, 0.2)
b_range = np.arange(1000, 5000, 200)
c_range = np.arange(12, 21, .5)

a_position = range(len(a_range))
b_position = range(len(b_range))
c_position = range(len(c_range))

data_grid = np.zeros([len(a_range), len(b_range), len(c_range)])
record_data = []

start_time = time.time()

for (a,apos) in zip(a_range, a_position):
    for (b, bpos) in zip(b_range, b_position):
        for (c, cpos) in zip(c_range, c_position):
            example = a+b+c  #The math in my model is much more complex and takes
            #about 7-8 seconds to process
            data_grid[apos, bpos, cpos] = example
            record_data.append([a, b, c, example])

with open('Test_File', 'wb') as f: 
    pickle.dump(record_data, f) 

np.save('example_values', data_grid) 

print 'Code ran for ', round(time.time()-start_time,2), ' seconds'

Now, I have absolutely zero experience in multiprocessing so my first attempt at this was changing the for loops into a function and then calling the multiprocessing function like this:

def run_model(a, b, c, apos, bpos, cpos):
    example=a+b+c  
    data_grid[apos, bpos, cpos]=example
    record_data.append([a, b, c, example])

from multiprocessing import Pool

if __name__=='__main__':
    pool=Pool(processes=4)
    pool.map(run_model, [a_range, b_range, c_range, a_position, b_positon, c_positon])
    pool.close()
    pool.join()

This failed however at the pool.map call. I understand this function only takes a single iterable argument but I don't know how to fix the problem. I am also skeptical that the data_grid variable is going to be filled correctly. The result I want from this function is two files saved, one as an array of values whose indexes correspond to a, b, and c values and the last a list of lists containing the a, b, c values and the resulting value (example in the code above)

Thanks for any help!

-Will

I think [```numpy.meshgrid```](http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html) would help but I can't try it out right now. Have a look at it. — wwii, Jun 29 '16 at 22:53
Just a comment concerning how you use Pool and map: i guess you might need to return values (instead of appending values to `record_data`) from your `run_model` function and fetch the result in a variable like `res = p.map(f, [1, 2, 3])`. Also the `map` method take (at least in python 3) a *chunksize* argument to chunk your *iterable*. — mgc, Jun 29 '16 at 23:05
(I haven't seen your *data_grid* object was also a global variable but you might need to take a look to the [synchronization primitives](https://docs.python.org/3.5/library/multiprocessing.html#synchronization-primitives) or shared ctypes objects parts of the documentation to share variables between process) — mgc, Jun 29 '16 at 23:20

score 1 · Answer 1 · edited May 23 '17 at 12:08

This doesn't solve your multiprocessing problem but it might make your process faster.

Your pattern of using nested loops to construct n-d coordinates and then operating on them can be vectorized using ```numpy.meshgrid````. Without knowing your actual calcs this approach can't be tested.

import numpy as np
a = np.array([0,1,2])
b = np.array([10,11,12])
c = np.array([20,21,22])

x, y, z = np.meshgrid(a,b,c)

>>> x
array([[[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]],

       [[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]],

       [[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]]])
>>> y
array([[[10, 10, 10],
        [10, 10, 10],
        [10, 10, 10]],

       [[11, 11, 11],
        [11, 11, 11],
        [11, 11, 11]],

       [[12, 12, 12],
        [12, 12, 12],
        [12, 12, 12]]])
>>> z
array([[[20, 21, 22],
        [20, 21, 22],
        [20, 21, 22]],

       [[20, 21, 22],
        [20, 21, 22],
        [20, 21, 22]],

       [[20, 21, 22],
        [20, 21, 22],
        [20, 21, 22]]])
>>> 



f = x + y + z

>>> f
array([[[30, 31, 32],
        [31, 32, 33],
        [32, 33, 34]],

       [[31, 32, 33],
        [32, 33, 34],
        [33, 34, 35]],

       [[32, 33, 34],
        [33, 34, 35],
        [34, 35, 36]]])
>>>

There is also the option of using meshgrid to create the actual points then use a single loop to iterate over the points - you lose the spatial info with this approach unless you can figure out how to reshape the result. I found this in SO answer https://stackoverflow.com/a/18253506/2823755

points = np.vstack([x,y,z]).reshape(3, -1).T

>>> points
array([[ 0, 10, 20],
       [ 0, 10, 21],
       [ 0, 10, 22],
       [ 1, 10, 20],
       [ 1, 10, 21],
       [ 1, 10, 22],
       [ 2, 10, 20],
       [ 2, 10, 21],
       [ 2, 10, 22],
       [ 0, 11, 20],
       [ 0, 11, 21],
       [ 0, 11, 22],
       [ 1, 11, 20],
       [ 1, 11, 21],
       [ 1, 11, 22],
       [ 2, 11, 20],
       [ 2, 11, 21],
       [ 2, 11, 22],
       [ 0, 12, 20],
       [ 0, 12, 21],
       [ 0, 12, 22],
       [ 1, 12, 20],
       [ 1, 12, 21],
       [ 1, 12, 22],
       [ 2, 12, 20],
       [ 2, 12, 21],
       [ 2, 12, 22]])
>>>

You can create a function and apply it to points

def g(point):
    x, y, z = point
    return x + y + z

result = np.apply_along_axis(g, 1, points)

>>> result
array([30, 31, 32, 31, 32, 33, 32, 33, 34, 31, 32, 33, 32, 33, 34, 33, 34, 35, 32, 33, 34, 33, 34, 35, 34, 35, 36])
>>>

Reshaping this example is straightforward:

>>> result.reshape(3,3,3)
array([[[30, 31, 32],
        [31, 32, 33],
        [32, 33, 34]],

       [[31, 32, 33],
        [32, 33, 34],
        [33, 34, 35]],

       [[32, 33, 34],
        [33, 34, 35],
        [34, 35, 36]]])
>>>

Test to make sure they both the same

>>> np.all(result.reshape(3,3,3) == f)
True
>>>

For more complicated maths, just iterate over points:

result = []
for point in points:
    example = some_maths
    result.append(example)

result = np.array(result).reshape(shape_of_the_3d_data)

Thanks for the suggestions. Unfortunately, the computation isn't quite as simple as a+b+c and would be a huge pain to change now. I'd much rather work with the framework I have but use multiprocessing to speed it up if possible. — Will.Evo, Jun 30 '16 at 14:34
@Will.Evo - so the second method, while slower than the first, may still be a bit faster than the nested Python loops and when you work out the multiprocessing you could still use it. You wouldn't need to use ```apply_along_axis```, just iterate over ```points``` - try it on a small dataset and see if it helps.. — wwii, Jun 30 '16 at 16:23
I am starting to see what you are saying...your suggestions actually led me to get multiprocessing to run. I returned a list of values from the model but I still need to verify their order. I also am stuck in getting multiprocessing to run through certain parts of code and not the entire file. Anyway, I'm working on it (even though I know someone on this site could do it in two seconds haha) — Will.Evo, Jun 30 '16 at 17:13
@Will.Evo - can you modify the model to except something like a [```namedtuple```](https://docs.python.org/3/library/collections.html#collections.namedtuple) which has a field for the data and a field for the original *location*? The model operates on the data *field* and returns a ```namedtuple``` with the coordinates so you can reconstruct it? — wwii, Jul 04 '16 at 03:46
I actually just wrote in the meshgrid method for the model and tested it on a short run. Seems to work fine but I won't know how much time is saved until tonight when I run a longer iteration of the model. Thanks for all the help. I'll let you know how it goes. PS: I've abandoned my multi-processing aspirations. I am in an internship and only have three weeks left for my project, I've gotta focus on getting results (even if that means long model runs). — Will.Evo, Jul 06 '16 at 20:15
So I tested the meshgrid method and it works perfectly (compared it to a previous run to make sure) but the time saved wasn't all that much. I'd say I save maybe 5-10% on run time. It translates into about an hour saved for the longer runs and for shorter runs only a few minutes. It is cleaner though so that is nice. I'm going to post my solution as an answer shortly. Thanks for all the help. — Will.Evo, Jul 07 '16 at 15:39
weirdly enough, the example I posted above actually loses time with the meshgrid method...by quite a bit of time (20 seconds when I made a, b, and c large ranges). — Will.Evo, Jul 07 '16 at 16:03
@Will.Evo, twenty seconds per loop or twenty extra seconds on a fourteen hour run? 20x20x20 is 8000 *loops*, if the actual calcs take seven seconds that is 15 hours just for the calcs. Sorry, — wwii, Jul 07 '16 at 20:21
no I mean the example code I provided in my first post. The code ran for ~11 seconds total with the for loops and ~30 seconds total with the meshgrid method. I made the range arrays for a, b, and c large enough to make the code run for a decent amount of time to compare. — Will.Evo, Jul 07 '16 at 20:35

Will.Evo · Answer 2 · 2016-07-07T16:02:35.870

As per user wwii's suggestions, I have rewrote the example above by using numpy's meshgrid and getting rid of the nested for loops for just a single loop. Here is an example of the working code.

import numpy as np
import time

a_range = np.arange(14, 18, 1)
b_range = np.arange(1000, 2200, 200)
c_range = np.arange(12, 21, 1)

a_position = range(len(a_range))
b_position = range(len(b_range))
c_position = range(len(c_range))

mesha, meshb, meshc = np.meshgrid(a_range, b_range, c_range)
mesh_vals = np.vstack([mesha, meshb, meshc]).reshape(3, -1).T

mesha_pos, meshb_pos, meshc_pos = np.meshgrid(a_position, b_position, c_position)
mesh_positions = np.vstack([mesha_pos, meshb_pos, meshc_pos]).reshape(3,-1).T

data_grid = np.zeros([len(a_range), len(b_range), len(c_range)])
record_data = []

start_time = time.time()

for pol in range(len(mesh_positions)):
    example = mesh_vals[pol][0]+ mesh_vals[pol][1]+ mesh_vals[pol][2]
    data_grid[mesh_positions[pol][0], mesh_positions[pol][1], mesh_positions[pol][2]] = example
    record_data.append([mesh_vals[pol][0], mesh_vals[pol][1], mesh_vals[pol][2], example])

print 'Code ran for ', round(time.time()-start_time,2), ' seconds'

This actually, after further investigation caused the run time to increase rather significantly. The difference between the for loops and this method was 20 seconds when subjected to a large range for a, b, and c. I have no idea why, but I do know this construction of the problem should make multiprocessing easier since there is only a single for loop to deal with.

Nested for loops using multiprocessing

2 Answers2