0

I meet a practical problem in my current work. I need to share a complex python object such as following structure between multiprocesses:

import numpy as np
from typing import List, Tuple

class ComplexClass:
    def __init__(
            self,
            u: np.ndarray,
            v: Tuple[Tuple[int, float]],
            w: str,
            x: int,
            y: List[Tuple[int, np.ndarray]],  # all numpy array here with same shape
            z: List[List[int]],  # inner list with same length
    ):
        self.u = u
        self.v = v
        self.w = w
        self.x = x
        self.y = y
        self.z = z

My server's RAM is 128G, one object above is ~50G. We need to use >=4 processes to handle above object to meet timeout requirement. I found this object will have 4 copies in 4 processes which are out of memory. Even I apply to run it in server with larger RAM, the overhead of copy objects is huge. In my current usecase, this object is read-only in 4 processes, therefore there is no data races and I don't need any lock.

The whole structure of problem is somehow like following:

from multiprocessing import Pool
import numpy as np
from typing import List, Tuple


class ComplexClass:
    ... # defined before


def handle(instance: ComplexClass, other_params):
    # main logic, we only read the value in instance,
    # and will not modify the instance
    ...


def main():
    instance = ...  # get the instance of ComplexClass
    params_list = [(instance, 0), (instance, 1), (instance, 2), (instance, 3)]
    with Pool(4) as pool:
        res = pool.starmap(handle, params_list)


if __name__ == '__main__':
    main()

My questions:

  1. Is there any way I can share one copy of above object in all different processes? I've read multiple posts here. Many posts are outdated and I only found how to share single value (use multiprocessing.Value), single list multiprocessing.Array or single numpy.array. How to share above generic complex object?

PS: I cannot use multithreading in my example, since the codebase in my current company is CPython as interpreter and it has GIL. I cannot change to other interpreter.

maplemaple
  • 1,297
  • 6
  • 24
  • If you really only found Value and Array these may be helpful: https://stackoverflow.com/questions/47837206/sharing-a-complex-python-object-in-memory-between-separate-processes https://stackoverflow.com/questions/10721915/shared-memory-objects-in-multiprocessing . multiprocessing also has shared_memory in python 3.8+ but I'm not really familiar with how it works. – Mateus Reis Dec 06 '21 at 03:23
  • @MateusReis I'm not advanced user. I'm new to python multiprocessing. From the doc, I can only use in simple case. I don't know how to generalize it to this complex case. Thank you. – maplemaple Dec 06 '21 at 04:05

0 Answers0