5

I have a problem, which is similar to this:

import numpy as np

C = np.zeros((100,10))

for i in range(10):
    C_sub = get_sub_matrix_C(i, other_args) # shape 10x10
    C[i*10:(i+1)*10,:10] = C_sub

So, apparently there is no need to run this as a serial calculation, since each submatrix can be calculated independently. I would like to use the multiprocessing module and create up to 4 processes for the for loop. I read some tutorials about multiprocessing, but wasn't able to figure out how to use this to solve my problem.

Thanks for your help

RoSt
  • 51
  • 1
  • 7
  • 2
    In order for multiprocessing to yield performance improvement the computations **must** take significant time. Because multiprocessing is going to *serialize* the data, send it to the subprocesses, deserialize it and perform the computations, serialize the result, send it back to the main process and finally deserialize it. Serialization/deserialization take quite some time plus inter-process communication isn't that fast too. If `get_sub_matrix` is literally just a few matrix accesses you aren't going to obtain any speedup. – Bakuriu Mar 08 '16 at 12:48
  • This is just for illustration purpose. In the end my matrix will have dimensions about 100000 x 20000, but what is more important the get_sub_matrix_C is kind of slow and I think I cant make that function any faster. – RoSt Mar 08 '16 at 12:52
  • Does get_sub_matrix_C need to access all the matrix or just the submatrix? because, if it need it all, the serialization of one copy of the big matrix for each subproccess will be very time and memory consuming. – eguaio Mar 08 '16 at 12:54
  • Actually, get_sub_matrix_C doesn't depend on any entries of C. It just gives the submatrix that I want to write in C, where i determines the "position". – RoSt Mar 08 '16 at 12:57

2 Answers2

4

A simple way to parallelize that code would be to use a Pool of processes:

pool = multiprocessing.Pool()
results = pool.starmap(get_sub_matrix_C, ((i, other_args) for i in range(10)))

for i, res in enumerate(results):
    C[i*10:(i+1)*10,:10] = res

I've used starmap since the get_sub_matrix_C function has more than one argument (starmap(f, [(x1, ..., xN)]) calls f(x1, ..., xN)).

Note however that serialization/deserialization may take significant time and space, so you may have to use a more low-level solution to avoid that overhead.


It looks like you are running an outdated version of python. You can replace starmap with plain map but then you have to provide a function that takes a single parameter:

def f(args):
    return get_sub_matrix_C(*args)

pool = multiprocessing.Pool()
results = pool.map(f, ((i, other_args) for i in range(10)))

for i, res in enumerate(results):
    C[i*10:(i+1)*10,:10] = res
Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • Thanks for your answer. Unfortunately I can't test it, since I don't have starmap. Probably I'm using an outdated version of multiprocessing? Version: 0.70a1 – RoSt Mar 08 '16 at 13:14
  • @RoSt You can use `map` and modify the function to accept a single parameter. I've edited the answer to add this solution too. – Bakuriu Mar 08 '16 at 13:30
  • Thanks for the easy and straightforward solution. It works fine. I would vote you up, but my own reputation is <15, sorry... – RoSt Mar 08 '16 at 13:40
0

The following recipe perhaps can do the job. Feel free to ask.

import numpy as np
import multiprocessing

def processParallel():

    def own_process(i, other_args, out_queue):
        C_sub = get_sub_matrix_C(i, other_args)
        out_queue.put(C_sub)            

    sub_matrices_list = []
    out_queue = multiprocessing.Queue()
    other_args = 0
    for i in range(10):
        p = multiprocessing.Process(
                            target=own_process,
                            args=(i, other_args, out_queue))
        procs.append(p)
        p.start()

    for i in range(10):
        sub_matrices_list.extend(out_queue.get())

    for p in procs:
        p.join()

    return sub_matrices_list    

C = np.zeros((100,10))

result = processParallel()

for i in range(10):
    C[i*10:(i+1)*10,:10] = result[i]
eguaio
  • 3,754
  • 1
  • 24
  • 38
  • Thanks for your answer. I tried it, but I got confusing results. The same entries were repeated over and over again. – RoSt Mar 08 '16 at 13:42
  • 1
    I just corrected the bug, sorry. Anyway, the other answer seems more succinct and practical. I will try it myself too! :) – eguaio Mar 08 '16 at 14:28