Updating object attributes from within module run in multiprocessor process

Question

I am relatively new to python and definitely new to multiprocessing. I'm following this question/answer for the structure of my multiprocessing, but in def func_A, I'm calling a module that passes a class object as one of the arguments. In the module, I change an object attribute that I would like the main program to see and update the user with the object attribute value. The child processes run for very long times, so I need the main program to provide updates as they run.

My suspicion is that I'm not understanding namespace/object scoping or something similar, but from what I've read, passing an object (an instance of a class?) to a module as an argument passes a reference to the object and not a copy. I would have thought this meant that changing the attributes of the object in the child process/module would have changed the attributes in the main program object (since they're the same object). Or am I confusing things?

The code for my main program:

# MainProgram.py
import multiprocessing as mp
import time
from time import sleep
import sys
from datetime import datetime
import myModule

MYOBJECTNAMES = ['name1','name2']

class myClass:
  def __init__(self, name):
    self.name = name
    self.value = 0

myObjects = []
for n in MYOBJECTNAMES:
  myObjects.append(myClass(n))

def func_A(process_number, queue):
  start = datetime.now()
  print("Process {} (object: {}) started at {}".format(process_number, myObjects[process_number].name, start))
  myModule.Eval(myObjects[process_number])
  sys.stdout.flush()

def multiproc_master():
  queue = mp.Queue()
  proceed = mp.Event()

  processes = [mp.Process(target=func_A, args=(x, queue)) for x in range(len(myObjects))]
  for p in processes:
    p.start() 

  for i in range(100):
    for o in myObjects:
      print("In main: Value of {} is {}".format(o.name, o.value))
    sleep(10)

  for p in processes:  
    p.join()

if __name__ == '__main__':
  split_jobs = multiproc_master()
  print(split_jobs)

The code for my module program:

# myModule.py
from time import sleep

def Eval(myObject):

  for i in range(100):
    myObject.value += 1
    print("In module: Value of {} is {}".format(myObject.name, myObject.value))
    sleep(5)

martineau · Accepted Answer · 2017-11-18T19:09:07.647

That question/answer you linked to probably was probably a poor choice to use as a template, as it's doing many things that your code doesn't require (much less use).

I think your biggest misconception about how multiprocessing works is thinking that all the code is running in the same address-space. The main task runs in its own, and there are separate ones for each subtask. The way your code is written, each of them will end up with its own separate myObjects list. That's why the main task doesn't see any of the changes made by any of the other tasks.

While there are ways share objects using the multiprocessing module, doing so often introduces significant overhead because keeping it or them all in-sync between all the processes requires lots of things happening "under the covers" to make seem like they're shared (which is what is really going on since they can't actually be because of having separate address-spaces). This overhead frequently completely cancels out any speed gained by parallel-processing.

As stated in the documentation: "when doing concurrent programming it is usually best to avoid using shared state as far as possible".

Thanks @martineau... the different address space makes sense and good to know. Maybe you know, but do you think a better approach would be for each module or subprocess to update a database and the main program pull information from that? — physicsnerd, Nov 18 '17 at 15:30
@physicsnerd: Storing the objects in a database sounds like a feasible approach since (generally) they support concurrent access to their contents. — martineau, Nov 18 '17 at 16:31

Updating object attributes from within module run in multiprocessor process

1 Answers1