Here's an idea on how you might call update
against multiple Ball
objects with the same arguments in parallel. Here I am using multiprocessing.pool.Pool
class.
Because Python serializes/de-serializes the Ball
object from the main process to the process in the pool that will be executing the task, any modifications to the object will not be reflected back in the object copy that "lives" in the main process (as you found out). But that does not prevent update
from returning a list (or tuple) of updated attributes that have been modified that the main process can use to update its copy of the object.
class Ball:
# If this is a class constant, then it can and should stay here:
radius = 32
def __init__(self, x, y, vx, vy, c):
self.x = x
self.y = y
self.vx = vx
self.vy = vy
self.color = c
return
def update(self, w, h, t):
time = float(t) / 1000000.0
#print(time)
xp = float(self.vx) * float(time)
yp = float(self.vy) * float(time)
self.x += xp
self.y += yp
#print (str(xp) +"," +str(yp))
if self.x < 32:
self.vx = 0 - self.vx
self.x += (32 - self.x)
if self.y < 32:
self.vy = 0 - self.vy
self.y += (32 - self.y)
if self.x + 32 > w:
self.vx = 0 - self.vx
self.x -= (self.x + 32) - w
if self.y + 32 > h:
self.vy = 0 - self.vy
self.y -= (self.y + 32) - h
# Return tuple of attributes that have changed
# (Not used by serial benchmark)
return (self.x, self.y, self.vx, self.vy)
def __repr__(self):
"""
Return internal dictionary of attributes as a string
"""
return str(self.__dict__)
def prepare_benchmark():
balls = [Ball(1, 2, 3, 4, 5) for _ in range(1000)]
arg_list = (3.0, 4.0, 1.0)
return balls, arg_list
def serial(balls, arg_list):
for ball in balls:
ball.update(*arg_list)
def parallel_updater(arg_list, ball):
return ball.update(*arg_list)
def parallel(pool, balls, arg_list):
from functools import partial
worker = partial(parallel_updater, arg_list)
results = pool.map(worker, balls)
for idx, result in enumerate(results):
ball = balls[idx]
# unpack:
ball.x, ball.y, ball.vx, ball.vy = result
def parallel2(pool, balls, arg_list):
results = [pool.apply_async(ball.update, args=arg_list) for ball in balls]
for idx, result in enumerate(results):
ball = balls[idx]
# unpack:
ball.x, ball.y, ball.vx, ball.vy = result.get()
def main():
import time
# Serial performance:
balls, arg_list = prepare_benchmark()
t = time.perf_counter()
serial(balls, arg_list)
elapsed = time.perf_counter() - t
print(balls[0])
print('Serial elapsed time:', elapsed)
print()
print('-'*80)
print()
# Parallel performance using map
# We won't even include the time it takes to create the pool
from multiprocessing import Pool
pool = Pool() # pool size is 8 on my desktop
balls, arg_list = prepare_benchmark()
t = time.perf_counter()
parallel(pool, balls, arg_list)
elapsed = time.perf_counter() - t
print(balls[0])
print('Parallel elapsed time:', elapsed)
print()
print('-'*80)
print()
# Parallel performance using apply_async
balls, arg_list = prepare_benchmark()
t = time.perf_counter()
parallel2(pool, balls, arg_list)
elapsed = time.perf_counter() - t
print(balls[0])
print('Parallel2 elapsed time:', elapsed)
pool.close()
pool.join()
# Required for windows
if __name__ == '__main__':
main()
Prints:
{'x': -29.0, 'y': -28.0, 'vx': 3, 'vy': 4, 'color': 5}
Serial elapsed time: 0.0018328999999999984
--------------------------------------------------------------------------------
{'x': -29.0, 'y': -28.0, 'vx': 3, 'vy': 4, 'color': 5}
Parallel elapsed time: 0.236945
--------------------------------------------------------------------------------
{'x': -29.0, 'y': -28.0, 'vx': 3, 'vy': 4, 'color': 5}
Parallel2 elapsed time: 0.1460790000000000
I used nonsense arguments for everything but you can see that the overhead of handling the serialization/deserialization and updating of the main process's objects cannot be compensated for by processing the 1,000 calls in parallel when you have such a trivial worker function as update
.
Note that benchmark Parallel2, which uses method apply_async
, actually is more performant in this case than benchmark Parallel, which uses method map
, which is a bit surprising. My guess is that this is due in part to having to use method functools.partial
to convey the additional, non-changing w
, h
, and t
arguments in the form of arg_list
to worker function parallel_updater
, which provides an additional function call required. So that's a total of two more function calls that benchmark Parallel has to make for each ball update.