Parallelizing a function that changes an attribute of a class under condition

Question

How can I parallelize a function that modifies an attribute of a class? In the following example, change_arr() modifies the contents of the attribute x of the class based on loc; it assigns -99 to each element of x that equals loc. When I run this script as written, I get the same array in both prints (and hence no modification took place). If I, however, comment the line using Pool and uncomment the two lines that do the same job using a for loop I get [-3, -2, -1, 0, 1, 2, 3] and [-99, 2, -99, 0, -99, 2, -99], which is what I want.

My question is, then, how can I parallelize the use of the function change_arr to modify the same class attribute in parallel?

I am aware that there probably are better ways to implement this. However, the example is not exactly my code (as it is way longer and more complex), but a minimal example.

import numpy as np
from multiprocessing import Pool


class MyClass:
    def __init__(self) -> None:
        self.x = np.array([-3, -2, -1, 0, 1, 2, 3])

    def process(self):
        Pool(2).map(self.change_arr, [-3, -1, 1, 3])

        # for i in [-3, -1, 1, 3]:
        #     self.change_arr(i)

    def change_arr(self, loc):
        self.x[self.x == loc] = -99
    
    
if __name__ == '__main__':
    mc = MyClass()
    print(mc.x)
    mc.process()
    print(mc.x)

I'm not 100% confident, but `map(self.change_arr, [list])` will pass `list` to `self` and not to `loc`. https://stackoverflow.com/questions/5442910/how-to-use-multiprocessing-pool-map-with-multiple-arguments — thethiny, Sep 26 '22 at 17:25
there are 2 major issues here I see right away.. 1: separate processes don't share memory, so calling `Pool(2).map(self.change_arr...` must make a copy of `self` to send to the worker process and execute `change_arr` on that copy (the copy that exists in the main process is not modified). 2: you don't ever attempt to clean up the `Pool`. In this exact case, it should still function fine, but it's best to a context manager for the pool, or at least make sure to call `close()` on the pool when you're done with it, and call `join()` to make sure it stops properly. — Aaron, Sep 26 '22 at 17:49
@thethiny no, it won't, `self.change_arr` is a bound method, so the only parameter is `loc` — juanpa.arrivillaga, Sep 26 '22 at 17:55

score 0 · Answer 1 · answered Sep 26 '22 at 17:56

You can share memory between processes with multiprocessing.Value and multiprocessing.Array types. I've created a minimal example how it works without involving numpy and classes.

from multiprocessing import Pool, Array

def change_arr(loc):
    for i in range(len(x)):
        if loc == x[i]:
            x[i] = -99


if __name__ == '__main__':
    x = Array('i', [-3, -2, -1, 0, 1, 2, 3])
    print([i for i in x])
    Pool(2).map(change_arr, [-3, -1, 1, 3])
    print([i for i in x])

I'm not sure if it works with numpy arrays. Theoretically the objects wrapped in Arrays or Values can be of any type. The trick is to define x as a global variable and bind it to the function used in the map function.

Parallelizing a function that changes an attribute of a class under condition

1 Answers1