0

If you create a new Process in python, it will serialize and copy the entire available scope, as far as I understand it. If you use multiprocessing.Pipe() it also allows sending various things, not just raw bytes.

However, instead of sending, I simply want to update a variable that contains a simple POD object like this:

class MyStats:
    def __init__(self):
        self.bytes_read = 0
        self.bytes_written = 0

So say that in a process, when I update these stats, I want to tell python to serialize it and send it to the parent process' side somehow. I don't want to have to create multiprocessing.Value for each and every one of these things, that sounds super tedious.

Is there a way to tell python to pass and overwrite a specific object property somehow?

Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778
  • Does this answer your question? [Shared state in multiprocessing Processes](https://stackoverflow.com/questions/30264699/shared-state-in-multiprocessing-processes) – MSpiller Dec 12 '22 at 15:50
  • @M.Spiller No it doesn't and I was under the impression that my question describes that I already know what is discussed in the Q&A you have linked. – Tomáš Zato Dec 12 '22 at 16:09
  • You can create a proxy object for `MyStats` using Managers. All processes will have access to this proxy and whenever they change an instance attribute's value, the change will be reflected in all processes that have access to the proxy. – Charchit Agarwal Dec 12 '22 at 20:56
  • @CharchitAgarwal Could you please elaborate? Is a proxy something the manager can create from a type/instance? I guess if something could take my instance and replace all properties with getter/setter that uses a Value internally, that would be nice. Is that what you're referring to? – Tomáš Zato Dec 13 '22 at 11:45
  • @TomášZato Yes you can create a proxy for a class which can be used as an instance. It will not replace properties with getters and setters that use Value. Instead, it will store the actual instance in a separate process, and all commands from the proxy (method calls like `__getattr__`, `__setattr__`) will be sent over TCP and the return value passed back to the proxy. So essentially, you will synchronize the data within the instance across processes. I'll try to write an answer when I have time but you can look around on StackOverflow for examples too (the docs also go in some more details). – Charchit Agarwal Dec 14 '22 at 00:17

1 Answers1

0

A manager is what you need here: it will be slower but all data stored inside will be automatically synced with other processes. Here is a simple example below:

from multiprocessing.managers import BaseManager, public_methods, NamespaceProxy
from multiprocessing import Process


def make_proxy(name, cls, base=None):
    """
    Args:
        name : A string that should match the variable name the proxy will be assigned to
        cls : The class for which you want to create a proxy for
        base :  If you are subclassing NamespaceProxy (or any other implementation) and want to use that subclass as the
                base for this new proxy, then pass the subclass as the base using this argument

    """
    exposed = public_methods(cls) + ['__getattribute__', '__setattr__', '__delattr__']
    return _MakeProxyType(name, exposed, base)


def _MakeProxyType(name, exposed, base=None):
    """
    Attempts to replicate multiprocessing.managers.MakeProxType properly
    """

    if base is None:
        base = NamespaceProxy
    exposed = tuple(exposed)

    dic = {}

    for meth in exposed:
        if hasattr(base, meth):
            continue
        exec('''def %s(self, *args, **kwds):
        return self._callmethod(%r, args, kwds)''' % (meth, meth), dic)

    ProxyType = type(name, (base,), dic)
    ProxyType._exposed_ = exposed
    return ProxyType


class MyStats:

    def __init__(self):
        self.bytes_read = 0
        self.bytes_written = 0


def worker(my_stats):
    my_stats.bytes_read = 100
    print("Worker process read 100 bytes!")

# Remember to set the name of the variable and the "name" argument to the same value otherwise you will have trouble
# pickling this. If for some reason you cannot do this then you must change the variable's __qualname__ property to
# reflect where the object actually resides so pickle can find it.
MyStatsProxy = make_proxy('MyStatsProxy', MyStats)

if __name__ == "__main__":

    # Register our proxy and start the manager process
    BaseManager.register("MyStats", MyStats, MyStatsProxy)
    manager = BaseManager()
    manager.start()

    # Create our shared instance and modify it from another process
    my_stats = manager.MyStats()
    p = Process(target=worker, args=(my_stats,))
    p.start()
    p.join()

    # Check value from main process
    print(f"In main process, bytes read are {my_stats.bytes_read}!")

Output

Worker process read 100 bytes!
In main process, bytes read are 100!

Check this question and its answers for more useful information about managers/registering classes and alternate methods to achieve the same result

Note: Keep in mind that managers return pickled values for all objects you access through it. So any modifications you do on mutable objects should be done from within an instance method rather than requesting the mutable object through the proxy and modifying it from outside. For example, doing below will not modify the attribute some_list in the manager at all, only the local copy (to the process) of this attribute will be modified:

my_stats.some_list[0] = "some value"

Instead, you should create an instance method for modifications and call that instead:

my_stats.modify_list(0, "some value")

Alternatively, you can also force the manager to update the mutable object by re-assigning the new value for the object:

local_copy = my_stats.some_list
local_copy[0] = "some value"
my_stats.some_list = local_copy
Charchit Agarwal
  • 2,829
  • 2
  • 8
  • 20