Multi Processing member function of a class store in array in python

Question

I program mostly in C, C++ and recently converted a project over to python. Except I haven't been able to convert the multiprocessing as easy.

In the example I have an array fill with a ball class that has a member function named update that has 3 variables passed in.

That's it below. It's store in an array called balls. I've gone through enough post documentations and videos and haven't found anything covering this a few get close but don't show how to deal with the variables being passed in.

Ideally I would create a process pull and let it split the work up between them. I need to retrieve the objects and update the one's in the original process space.

Not sure but it looks like it may be easier to force it to output a tuple then with all the data to update the class and just write another function to update the class.

Feed back on the best way to do this in python is appreciated. Also I appreciate performance over the easy of doing something. That's the point of doing this after all. Thanks in advance.

class Ball:
          
    def __init__(self,x,y,vx,vy,c):
        self.x=x
        self.y=y
        self.vx=vx
        self.vy=vy
        self.color=c
        return
    @classmethod
    def update(self,w,h,t):
        time = float(t)/float(1000000)
        #print(time)
        xp = float(self.vx)*float(time)
        yp= float(self.vy)*float(time)
        self.x += xp
        self.y += yp
        #print (str(xp) +"," +str(yp))
        if self.x<32:
            self.vx = 0 - self.vx
            self.x += (32-self.x)
        if self.y<32:
            self.vy = 0 - self.vy
            self.y += (32-self.y)
        if self.x+32>w:
            self.vx = 0 - self.vx
            self.x -= (self.x+32)-w
        if self.y+32>h:
            self.vy = 0 - self.vy
            self.y -= (self.y+32)-h
        return

The class is updated via the following method

def play_u(self):
    t = self.gt.elapsed_time()
    self.gt.set_timer()
    for i in self.balls:
        i.update(self.width,self.height,t)
    return

Just as an aside: As with C++, Python supports class attributes as well as instance attributes and it's not clear what you are intending for your class `Ball`. For example, your `vx` attribute is defined as a class attribute and in your methods you are naming the first argument `cls` according to the convention that these are class methods, but the methods are not decorated with `@classmethod`, so they are actually instance methods. So in method `update` when you assign a value to `cls.vx` you are *not* updating the class attributes. Get rid of the class attributes and rename `cls` to `self`. — Booboo, Dec 21 '21 at 12:20
But if you were really intending everything to be class attributes and for your methods to be updating these class attributes, then you must decorate your methods with `@classmethod`. But that wouldn't make sense if you are dealing with an array of these, would it? — Booboo, Dec 21 '21 at 12:23
Multiprocessing carries substantial overhead just in passing arguments from one address space to another. To make it worthwhile, the processing done by `update` would have to require sufficient CPU resources such that the savings gained by parallelism offsets the additional overhead incurred. I am not sure that would be the case here. It would help if you updated the question to show how you actually call `update` multiple times, i.e. post something approaching a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). — Booboo, Dec 21 '21 at 12:37
I'll make the style changes you all suggested. Thanks on that part. Sorry, just getting used to python. Ran a search of @classmethod. I had to find it on another site. That's typical of what I have found when it comes to PEP8 documentation. Here is the full project. https://github.com/Diconica/game_Engine It's a game engine being converted from C++ An array of class objects is standard in OOP programming. Are you trying to say python can't do OOP? It's normal in every single other OOP language, C++,C#,JAVA... — Diconica, Dec 21 '21 at 16:17
BTW, thanks. Frankly, to this point I didn't know there was two different variable types such as class and instance. In other languages the variable is declared in the class as a member of it. The way it is described here as a shared variable makes it almost like a singleton. — Diconica, Dec 21 '21 at 16:38
Of course, Python can do standard OOP. But what you had created with declaring `x = 0` following `class Ball:` were *class* attributes while your methods were instance methods updating instance attributes (not the class attributes). So (1) there was no point in even having the class attributes defined at all and (2) you should be naming the first argument of your methods *self* to be aligned with PEP8. Your updated code looks *almost* right except you now want to get rid of the `@classmethod` decorator assuming that you want `update` to be the equivalent of ` `C++` virtual method. — Booboo, Dec 21 '21 at 17:11
I get that. As I said I just found out the difference in class vs instance methods. C,C++,JAVA, C# .... I don't know of another language that does that. The way python declares class methods is the normal method for all other languages to declare instance variables. Also there is no easy method to split class methods among multiple files. I do mean class methods. Such as the game engines main class. — Diconica, Dec 21 '21 at 17:24
What I am calling *class attributes" as contrasted with "instance attributes" would be called in C++ parlance *static members* shared by all object instances and can be accessed via the class without even having an object instance, if that helps to make things clearer. So your `x = 0` was defining the equivalent of a static member. — Booboo, Dec 21 '21 at 17:24
In C++, a compiled language, you must declare all the members up front. But in Python attributes (equivalent to members) can be dynamically added to object instances or classes at any time and the equivalent of the C++ constructor, i.e. the `__init__` method is where the attributes are "declared" simply by assigning values to attributes. — Booboo, Dec 21 '21 at 17:28
See [Difference between staticmethod and classmethod](https://stackoverflow.com/questions/136097/difference-between-staticmethod-and-classmethod). Then see [this Python program](https://ideone.com/ytsgWs) and this somewhat equivalent (C++ doesn't have the distinction of static vs. class methods) [C++ program](https://ideone.com/23Lxpi), which might also help. — Booboo, Dec 21 '21 at 18:25

Booboo · Accepted Answer · 2021-12-23T12:39:53.240

1

Here's an idea on how you might call update against multiple Ball objects with the same arguments in parallel. Here I am using multiprocessing.pool.Pool class.

Because Python serializes/de-serializes the Ball object from the main process to the process in the pool that will be executing the task, any modifications to the object will not be reflected back in the object copy that "lives" in the main process (as you found out). But that does not prevent update from returning a list (or tuple) of updated attributes that have been modified that the main process can use to update its copy of the object.

class Ball:
    # If this is a class constant, then it can and should stay here:
    radius = 32

    def __init__(self, x, y, vx, vy, c):
        self.x = x
        self.y = y
        self.vx = vx
        self.vy = vy
        self.color = c
        return

    def update(self, w, h, t):
        time = float(t) / 1000000.0
        #print(time)
        xp = float(self.vx) * float(time)
        yp = float(self.vy) * float(time)
        self.x += xp
        self.y += yp
        #print (str(xp) +"," +str(yp))
        if self.x < 32:
            self.vx = 0 - self.vx
            self.x += (32 - self.x)
        if self.y < 32:
            self.vy = 0 - self.vy
            self.y += (32 - self.y)
        if self.x + 32 > w:
            self.vx = 0 - self.vx
            self.x -= (self.x + 32) - w
        if self.y + 32 > h:
            self.vy = 0 - self.vy
            self.y -= (self.y + 32) - h
        # Return tuple of attributes that have changed
        # (Not used by serial benchmark)
        return (self.x, self.y, self.vx, self.vy)

    def __repr__(self):
        """
        Return internal dictionary of attributes as a string
        """
        return str(self.__dict__)

def prepare_benchmark():
    balls = [Ball(1, 2, 3, 4, 5) for _ in range(1000)]
    arg_list = (3.0, 4.0, 1.0)
    return balls, arg_list

def serial(balls, arg_list):
    for ball in balls:
        ball.update(*arg_list)

def parallel_updater(arg_list, ball):
    return ball.update(*arg_list)

def parallel(pool, balls, arg_list):
    from functools import partial

    worker = partial(parallel_updater, arg_list)
    results = pool.map(worker, balls)
    for idx, result in enumerate(results):
        ball = balls[idx]
        # unpack:
        ball.x, ball.y, ball.vx, ball.vy = result

def parallel2(pool, balls, arg_list):
    results = [pool.apply_async(ball.update, args=arg_list) for ball in balls]
    for idx, result in enumerate(results):
        ball = balls[idx]
        # unpack:
        ball.x, ball.y, ball.vx, ball.vy = result.get()

def main():
    import time

    # Serial performance:
    balls, arg_list = prepare_benchmark()
    t = time.perf_counter()
    serial(balls, arg_list)
    elapsed = time.perf_counter() - t
    print(balls[0])
    print('Serial elapsed time:', elapsed)

    print()
    print('-'*80)
    print()

    # Parallel performance using map
    # We won't even include the time it takes to create the pool
    from multiprocessing import Pool
    pool = Pool() # pool size is 8 on my desktop
    balls, arg_list = prepare_benchmark()
    t = time.perf_counter()
    parallel(pool, balls, arg_list)
    elapsed = time.perf_counter() - t
    print(balls[0])
    print('Parallel elapsed time:', elapsed)

    print()
    print('-'*80)
    print()

    # Parallel performance using apply_async
    balls, arg_list = prepare_benchmark()
    t = time.perf_counter()
    parallel2(pool, balls, arg_list)
    elapsed = time.perf_counter() - t
    print(balls[0])
    print('Parallel2 elapsed time:', elapsed)


    pool.close()
    pool.join()


# Required for windows
if __name__ == '__main__':
    main()

Prints:

{'x': -29.0, 'y': -28.0, 'vx': 3, 'vy': 4, 'color': 5}
Serial elapsed time: 0.0018328999999999984

--------------------------------------------------------------------------------

{'x': -29.0, 'y': -28.0, 'vx': 3, 'vy': 4, 'color': 5}
Parallel elapsed time: 0.236945

--------------------------------------------------------------------------------

{'x': -29.0, 'y': -28.0, 'vx': 3, 'vy': 4, 'color': 5}
Parallel2 elapsed time: 0.1460790000000000

I used nonsense arguments for everything but you can see that the overhead of handling the serialization/deserialization and updating of the main process's objects cannot be compensated for by processing the 1,000 calls in parallel when you have such a trivial worker function as update.

Note that benchmark Parallel2, which uses method apply_async, actually is more performant in this case than benchmark Parallel, which uses method map, which is a bit surprising. My guess is that this is due in part to having to use method functools.partial to convey the additional, non-changing w, h, and t arguments in the form of arg_list to worker function parallel_updater, which provides an additional function call required. So that's a total of two more function calls that benchmark Parallel has to make for each ball update.

edited Dec 23 '21 at 12:39

answered Dec 21 '21 at 20:52

Booboo

38,656
3
37
60

Thanks, Booboo. I figured out the issue with the Class and Instance variables when you @classmethod. The way its done in the C++ engine. We generate the pool at the start of the program in the game class. We use one per process core. Then we split the load among the processes each loop. Each thread or process has its own que we can dump the tasks into. That way the work is divided among the CPUs. Sean Parent has a good video covering the performance differences. https://www.youtube.com/watch?v=zULU6Hhp42w Without async the time is 1.53e-6 Async actually slows it down. – Diconica Dec 22 '21 at 14:43
I'm guessing Async in python is much like in C++ it generates its thread/process on the fly. Is there no way to do like we do in C++. Were we generate the pool and worker at the start of the game with each process having their own queue. And then dump whatever work load we want to them in the update function. From what I've been reading you use queue to means something different from c++ when it comes to multiprocessing. We use it to store tasks not so much as shared memory. The task contains shared memory. – Diconica Dec 22 '21 at 14:56
No. When you have a pool of N processes/threads (N is 2 in the demo I posted) there are underlying input and output queues that you do not see. The processes are created at the outset and they each block waiting to read the next *chunk* of tasks from the input queue (when tasks are submitted with `map`, which blocks until results are returned, or `map_async` the N tasks implied by the N elements of the *iterable* argument are broken up into chunks of size determined by the *chunksize* argument and written to the queue in those chunks to minimize the number of interprocess transfers. (more...) – Booboo Dec 22 '21 at 15:08
So when the process picks off the next chunk, it will in general consists of multiple tasks and the process will execute all of them writing its return values, also in chunks, to the output queue. When you use `apply_async`, there is no *chunksize* argument and the task is transferred immediately to the input queue as a chunk of size 1. That is why `map` can be more efficient than using `apply_async`. But `map` requires that you are using the same worker function for each element of the *iterable*, which was not the came for the demo (you were using different `Ball` instances). – Booboo Dec 22 '21 at 15:14
All the tasks submitted by the `map` function will be executed in parallel to the extent allowed by the number of processes you have in the pool even though you block until all the submitted tasks have been completed. You would only need to use `map_async` if you had, for example, a second map function you wanted to be executing in parallel with the first. – Booboo Dec 22 '21 at 15:18
Ok, so what I am here in is separate the function from the class so that it is the same worker function for all of them. Would it be of any benefit to create a manager in the game class and make the objects accessible through that. From what I read I think all the processes could then access it and it would also take up less memory. BTW thanks again for explaining all this. – Diconica Dec 22 '21 at 20:14
First, chunking becomes a performance issue when your *iterable* is very large but still have no idea what your actual program is doing. That is why I have asked several times that you update question showing how you invoke the `update` method repeatedly so I would have an idea of what your list of balls is like. But yes, you could have a non class function `def update(ball, w, h, t): return ball.update(w, h, t)` and then you use `pool.star_map(update, arg_list)` where `arg_list` is a list of either lists or tuples where each list or tuple represents the arguments for an invocation of update. – Booboo Dec 22 '21 at 20:35
But if you decide to create a managed class of some sort (if that is what you are talking about), then the objects you will be passing to your worker function are actually proxy objects and every method call results in the equivalent of a remote procedure call via a socket (Linux) or named pipe (Windows) to a process started by the manager. In short, it is not very performant. That is why I did not propose it. I thought it less expensive for `update` just to return some state that could be updated back in the main process. (more...) – Booboo Dec 22 '21 at 20:38
It is not as if the balls have to interact with one another on any call to `map`. If that were the case, then each ball would have be passed the complete list of balls and these would have to be either in shared memory (which is difficult when they are highly structured) or managed objects. – Booboo Dec 22 '21 at 20:42
I update the or post. As you requested. I also had previously posted a link to githubs repository where the complete project could be seen. Just in case it was needed. – Diconica Dec 23 '21 at 00:41
Also if I did have the balls interact with one another I would sub divide the space then tell into storage arrays containing occupants. Then the balls would only need check a few occupant list and test. Doing otherwise is extremely inefficient. Either use a BSP or grid method. – Diconica Dec 23 '21 at 00:47
Sorry -- I did not see the link to the Github repository. Your use of the name *self* for example as the formal argument name to function `load_content` as if `load_content` were a instance method of class `Game`, is most unusual. `load_content` is *not* an instance method of any class and you should not be naming the argument *self*. What it is being passed is a `Game` instance and so perhaps a name like *game* would be much clearer to somebody looking at this source in isolation as to what object is being updated. Or make `load_content` a "member" of `Game`, which is what I would do. – Booboo Dec 23 '21 at 12:04
I have updated the answer to align with your actual Github code. – Booboo Dec 23 '21 at 12:04
The confusion is clearly caused by my inexperience and lack of understanding of pythons styles and so on. No need for you to apologize. Also Ouch. If those benchmarks match on my system that would mean the parallelizing it is 100 times slower. Right now it takes. 0.002 seconds for all 1000 of the balls. And 2 microseconds for each. – Diconica Dec 23 '21 at 22:46
It's starting to feel like unless something is extremely heave as a take parallelizing it in python is a complete waste of time. If I was writing something super heavy like a ray tracer or a complex neural network. It might be worth while. Honestly, not sure my own ray tracer could benefit under this. I really appreciate all the work you put into this kind of looking pointless at this point. If was a bit more expensive it could be made up for in CPU count but 100 times slower there no way making up for that. – Diconica Dec 23 '21 at 22:58
1

Yes. My conclusion is that your update method is not sufficiently CPU-intensive to benefit from multiprocessing. If you feel it is justified, you can *accept* the answer until a better one comes along. – Booboo Dec 23 '21 at 23:31
1

Given the over head for the communication then in python processes I would need to have a single process handle all the updates of one type in a single call to it. That means I could have on process update the AI, One do all the physics, one handle movement and collision, So long as I send it all the data for 1000 in a single call. That way the for loop calling the smaller balls.update is done effectively on the process and not when sending it to it. – Diconica Dec 24 '21 at 15:14

Multi Processing member function of a class store in array in python

1 Answers1