I meet a practical problem in my current work. I need to share a complex python object such as following structure between multiprocesses:
import numpy as np
from typing import List, Tuple
class ComplexClass:
def __init__(
self,
u: np.ndarray,
v: Tuple[Tuple[int, float]],
w: str,
x: int,
y: List[Tuple[int, np.ndarray]], # all numpy array here with same shape
z: List[List[int]], # inner list with same length
):
self.u = u
self.v = v
self.w = w
self.x = x
self.y = y
self.z = z
My server's RAM is 128G, one object above is ~50G. We need to use >=4
processes to handle above object to meet timeout requirement. I found this object will have 4
copies in 4
processes which are out of memory. Even I apply to run it in server with larger RAM, the overhead of copy objects is huge. In my current usecase, this object is read-only in 4
processes, therefore there is no data races and I don't need any lock.
The whole structure of problem is somehow like following:
from multiprocessing import Pool
import numpy as np
from typing import List, Tuple
class ComplexClass:
... # defined before
def handle(instance: ComplexClass, other_params):
# main logic, we only read the value in instance,
# and will not modify the instance
...
def main():
instance = ... # get the instance of ComplexClass
params_list = [(instance, 0), (instance, 1), (instance, 2), (instance, 3)]
with Pool(4) as pool:
res = pool.starmap(handle, params_list)
if __name__ == '__main__':
main()
My questions:
- Is there any way I can share one copy of above object in all different processes? I've read multiple posts here. Many posts are outdated and I only found how to share single
value
(usemultiprocessing.Value
), single listmultiprocessing.Array
or single numpy.array. How to share above generic complex object?
PS: I cannot use multithreading
in my example, since the codebase in my current company is CPython
as interpreter and it has GIL
. I cannot change to other interpreter.