2

How can I share a nested object between Python processes with write access for tasklets(coroutines)?

Here is a simplified example with an analogy just I wrote for asking this question properly;

First of all please install greenlet package with: sudo pip install greenlet

In the example below:

  • An instance of Nature class referenced by habitat variable
  • This instance of Nature class has an instance variable called animals
  • While the initiation of this instance of Nature class, 8 different instances of Animal class created and appended to animals instance variable. Now if I'm correct this instance of Nature is a nested object.
  • As the last step live instance functions of Animal instances randomly switching using greenlet package's switch() function until global_counter reaches to 1000. This live function randomly changing the value of limbs instance variable of Animal instances.

greentest.py:

import random
from greenlet import greenlet

global_counter = 0

class Animal():

    def __init__(self,nature):
        self.limbs = 0
        nature.animals.append(self)
        self.tasklet = greenlet(self.live)

    def live(self,nature):
        global global_counter
        while True:
            self.limbs = random.randint(1, 10)
            global_counter += 1
            if global_counter > 1000:
                break
            random.sample(nature.animals,1)[0].tasklet.switch(nature)

class Nature():

    def __init__(self,how_many):
        self.animals = []
        for i in range(how_many):
            Animal(self)
        print str(how_many) + " animals created."
        self.animals[0].live(self)

The result is:

>>> import greentest
>>> habitat = greentest.Nature(8)
8 animals created.
>>> habitat.animals[0].limbs
3
>>> greentest.global_counter
1002

Working as expected. Changing the value of limbs and global_counter (non-zero)

But when I add multiprocessing to the equation;

greentest2.py:

import random
import multiprocessing
from greenlet import greenlet

global_counter = 0

class Animal():

    def __init__(self,nature):
        self.limbs = 0
        nature.animals.append(self)
        self.tasklet = greenlet(self.live)

    def live(self,nature):
        global global_counter
        while True:
            self.limbs = random.randint(1, 10)
            global_counter += 1
            if global_counter > 1000:
                break
            random.sample(nature.animals,1)[0].tasklet.switch(nature)

class Nature():

    def __init__(self,how_many):
        self.animals = []
        for i in range(how_many):
            Animal(self)
        print str(how_many) + " animals created."
        #self.animals[0].live(self)
        jobs = []
        for i in range(2):
            p = multiprocessing.Process(target=self.animals[0].live, args=(self,))
            jobs.append(p)
            p.start()

The result is not as expected:

>>> import greentest2
>>> habitat = greentest2.Nature(8)
8 animals created.
>>> habitat.animals[0].limbs
0
>>> greentest2.global_counter
0

Both the values of limbs and global_counter is unchanged (zero). I think this is because instances of Animal class and global_counteris not shared between processes. So how can I share this instance of Nature class or these instances of Animal class between processes?

ADDITION according to @noxdafox 's answer;

greentest3.py:

import random
import multiprocessing
from greenlet import greenlet

global_counter = multiprocessing.Value('i', 0)

class Animal():

    def __init__(self,nature):
        self.limbs = 0
        nature.animals.append(self)
        self.tasklet = greenlet(self.live)

    def live(self,nature):
        global global_counter
        while True:
            self.limbs = random.randint(1, 10)
            global_counter.value += 1
            if global_counter.value > 1000:
                break
            random.sample(nature.animals,1)[0].tasklet.switch(nature)

class Nature():

    def __init__(self,how_many):
        self.animals = []
        for i in range(how_many):
            Animal(self)
        print str(how_many) + " animals created."
        #self.animals[0].live(self)
        jobs = []
        for i in range(2):
            p = multiprocessing.Process(target=self.animals[0].live, args=(self,))
            jobs.append(p)
            p.start()

and then result is:

>>> import greentest3
>>> habitat = greentest3.Nature(8)
8 animals created.
>>> habitat.animals[0].limbs
0
>>> greentest3.global_counter.value
1004

I was perfectly aware that global_counter can be shared with this method since it's an integer but I'm actually asking how to share the instances of Nature and Animal classes between processes.

mertyildiran
  • 6,477
  • 5
  • 32
  • 55
  • You understood the problem correctly. A typical solution is to use queue or pipe. https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes – Kenji Noguchi Dec 12 '16 at 20:19
  • @KenjiNoguchi Could you explain more or could you answer with a little improved version of my code, please. – mertyildiran Dec 12 '16 at 20:29
  • I believe the correct term is [_circular references_](https://en.wikipedia.org/wiki/Circular_reference). and you have a list of jobs but don't you need to check if they have finished before assuming they have done their work? – Tadhg McDonald-Jensen Dec 12 '16 at 20:40
  • @TadhgMcDonald-Jensen Processes' job is symbolic on my code example. But they will sure write to instances of `Animal` class. Could you explain how can I use *circular references* on my case? – mertyildiran Dec 12 '16 at 21:44
  • "Now if I'm correct this instance of `Nature` is a nested object." I think _nested_ is the wrong term, `Nature` has reference to several `Animal` instances and they all have references to `Nature` object, so their references to each other are "circular". Sorry for not clarifying that in my original comment. – Tadhg McDonald-Jensen Dec 12 '16 at 22:16
  • Why do you need to share the instances of Nature and Animal classes between processes? That's a huge leap from the original question. A general purpose distributed computation system such as MapReduce has a master (driver) and multiple worker processes. The master maps computation tasks to workers, each worker calculates small part of the problem, then finally the master collects the results back from workers. Unless the worker is stateless (no inter dependency between workers) the computation would be complicated (race conditions, mutex, etc) – Kenji Noguchi Dec 13 '16 at 16:18

1 Answers1

1

Different processes do not share their memory.

If what you need to share is a single variable, you probably can use multiprocessing.Value

import multiprocessing

def function(counter):
    counter.value += 1

counter = multiprocessing.Value('i')
p = multiprocessing.Process(target=function, args=(counter))
p.start()
p.join()

EDIT: answering according to updates.

There is no abstraction mechanism allowing to share entire objects in memory. Shared memory is usually implemented as a simple array where processes can read/write once acquired the resource.

Moreover, OOP and threading/multiprocessing don't mix well together. IMHO should be considered an anti-pattern. On top of complex objects, you add the concurrent access and modification of their properties. This is a one way ticket for long and tedious debugging sessions.

The recommended pattern is the use of message queues. Imagining Threads and Processes as isolated entities which communicates via specific channels significantly simplifies the problem.

noxdafox
  • 14,439
  • 4
  • 33
  • 45
  • I know but how can I share the object. Sharing an integer is easy but I don't know how to share the instances of `Nature` and `Animal` classes. – mertyildiran Dec 12 '16 at 21:42