0

I have a requirement, wherein I have keep certain data (which can be anything pandas dataframe or any ML trained model) always available to all daemon processes created using multiprocessing module. Mostly I wont be requiring to modify this data but only use it. Multiprocessing modules provides various mechanisms like Value, Manager as explained in this answer. Does it makes sense to add such data to the separate module and access it from different processes, instead of using Value or Array.

common_data.py (contains common data to be shared by processes)

worker_specific_conf = None
common_conf = None
list = [1,2,3]

libX.py (contains functions which simply prints info from common_data.py)

import common_data as cmn
from posix import getpid

def functionX():
    print(str(getpid()) + " : " + str(cmn.worker_specific_conf) + " : " + str(cmn.common_conf))

def functionY():
    print(str(getpid()) + " : " + str(cmn.worker_specific_conf) + " : " + str(cmn.common_conf))

def functionList():
    print(str(getpid()) + " : " + str(cmn.worker_specific_conf) + " : " + str(cmn.list))

def functionDf():
    print(str(getpid()) + " : " + str(cmn.worker_specific_conf) + " : " + str(cmn.df))

workers.py (Run this, it creates multiple workers which access data from common_data.py using libX.py)

import uuid
import libX
import multiprocessing as mu
from time import sleep
import common_data as cmn #
from random import random
import pandas as pd

def worker(a):
    #global my_id  
    sleep(random())
    cmn.worker_specific_conf = uuid.uuid4() # 
    #my_id= uuid.uuid4()
    libX.functionX()
    sleep(random())
    libX.functionY()
    sleep(random())
    libX.functionList()
    sleep(random())
    libX.functionDf()

data = [1,2,3,4,5]
df = pd.DataFrame(data)
cmn.df = df    #adding data to data sharing module dynamically

cmn.common_conf = random()
cmn.list.append(4)
pool = mu.Pool(processes = 2)
pool.map(worker, range(3))

Is this approach ok if I just want to be able to read shared data from different processes?


Output

6732 : d08673d2-1d8f-4f87-b9ad-d1389ea564d6 : 0.3915408966829501
6732 : d08673d2-1d8f-4f87-b9ad-d1389ea564d6 : 0.3915408966829501
6732 : d08673d2-1d8f-4f87-b9ad-d1389ea564d6 : [1, 2, 3, 4]
12152 : af373f13-35b5-47b2-9736-5b19ee028c9c : 0.3915408966829501
6732 : d08673d2-1d8f-4f87-b9ad-d1389ea564d6 :    0
0  1
1  2
2  3
3  4
4  5
6732 : c629d9c3-f439-4818-ac79-1340f98470ea : 0.3915408966829501
12152 : af373f13-35b5-47b2-9736-5b19ee028c9c : 0.3915408966829501
12152 : af373f13-35b5-47b2-9736-5b19ee028c9c : [1, 2, 3, 4]
6732 : c629d9c3-f439-4818-ac79-1340f98470ea : 0.3915408966829501
12152 : af373f13-35b5-47b2-9736-5b19ee028c9c :    0
0  1
1  2
2  3
3  4
4  5
6732 : c629d9c3-f439-4818-ac79-1340f98470ea : [1, 2, 3, 4]
6732 : c629d9c3-f439-4818-ac79-1340f98470ea :    0
0  1
1  2
2  3
3  4
4  5
MsA
  • 2,599
  • 3
  • 22
  • 47
  • 2
    _"Mostly I wont be requiring to modify this data but only use it."_ Well, when you _do_ modify the data, I don't expect that modification to be visible in any other process. AFAIK, each process gets a completely independent copy. Take a look at https://docs.python.org/3.7/library/multiprocessing.html#sharing-state-between-processes for some common approaches to sharing state between processes. – Kevin Nov 01 '18 at 12:33
  • @Kevin I need different types of data (pandas dataframe, h5 file) to be shared. **Q1.** Is it possible with multiprocessing's `Value`? I felt no. So, I have written [small code](https://pastebin.com/KM5nmkJj). There is a `common_data` module which holds data shared by all daemons. A server listens on port. If it received message starting with `add` it adds random key-value to `common_data.map`. If it received message not starting with `add`, it spawns new daemon. The new daemon will be able to access all key-values added earlier to `common_data.map`. – MsA Nov 05 '18 at 14:22
  • [..continued] Now the issue is I have to add key-values on server's main thread. If I add key values in new daemon, other daemon wont be able to access it. **Q3.** Is it because those values become scoped to thread? And **Q4.** Why daemons are able to access values added by server? Is there any work around. **Q5.** Also all those new thread re running common_data.py which can be overhead if there is a lot of data? **Q6.** Is there any other option for my requirement? – MsA Nov 05 '18 at 14:23

0 Answers0