48

I am trying to return values from subprocesses but these values are unfortunately unpicklable. So I used global variables in threads module with success but have not been able to retrieve updates done in subprocesses when using multiprocessing module. I hope I'm missing something.

The results printed at the end are always the same as initial values given the vars dataDV03 and dataDV04. The subprocesses are updating these global variables but these global variables remain unchanged in the parent.

import multiprocessing

# NOT ABLE to get python to return values in passed variables.

ants = ['DV03', 'DV04']
dataDV03 = ['', '']
dataDV04 = {'driver': '', 'status': ''}


def getDV03CclDrivers(lib):  # call global variable
    global dataDV03
    dataDV03[1] = 1
    dataDV03[0] = 0

# eval( 'CCL.' + lib + '.' +  lib + '( "DV03" )' ) these are unpicklable instantiations

def getDV04CclDrivers(lib, dataDV04):   # pass global variable
    dataDV04['driver'] = 0  # eval( 'CCL.' + lib + '.' +  lib + '( "DV04" )' )


if __name__ == "__main__":

    jobs = []
    if 'DV03' in ants:
        j = multiprocessing.Process(target=getDV03CclDrivers, args=('LORR',))
        jobs.append(j)

    if 'DV04' in ants:
        j = multiprocessing.Process(target=getDV04CclDrivers, args=('LORR', dataDV04))
        jobs.append(j)

    for j in jobs:
        j.start()

    for j in jobs:
        j.join()

    print 'Results:\n'
    print 'DV03', dataDV03
    print 'DV04', dataDV04

I cannot post to my question so will try to edit the original.

Here is the object that is not picklable:

In [1]: from CCL import LORR
In [2]: lorr=LORR.LORR('DV20', None)
In [3]: lorr
Out[3]: <CCL.LORR.LORR instance at 0x94b188c>

This is the error returned when I use a multiprocessing.Pool to return the instance back to the parent:

Thread getCcl (('DV20', 'LORR'),)
Process PoolWorker-1:
Traceback (most recent call last):
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/pool.py", line 71, in worker
put((job, i, result))
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/queues.py", line 366, in put
return send(obj)
UnpickleableError: Cannot pickle <type 'thread.lock'> objects
In [5]: dir(lorr)
Out[5]:
['GET_AMBIENT_TEMPERATURE',
 'GET_CAN_ERROR',
 'GET_CAN_ERROR_COUNT',
 'GET_CHANNEL_NUMBER',
 'GET_COUNT_PER_C_OP',
 'GET_COUNT_REMAINING_OP',
 'GET_DCM_LOCKED',
 'GET_EFC_125_MHZ',
 'GET_EFC_COMB_LINE_PLL',
 'GET_ERROR_CODE_LAST_CAN_ERROR',
 'GET_INTERNAL_SLAVE_ERROR_CODE',
 'GET_MAGNITUDE_CELSIUS_OP',
 'GET_MAJOR_REV_LEVEL',
 'GET_MINOR_REV_LEVEL',
 'GET_MODULE_CODES_CDAY',
 'GET_MODULE_CODES_CMONTH',
 'GET_MODULE_CODES_DIG1',
 'GET_MODULE_CODES_DIG2',
 'GET_MODULE_CODES_DIG4',
 'GET_MODULE_CODES_DIG6',
 'GET_MODULE_CODES_SERIAL',
 'GET_MODULE_CODES_VERSION_MAJOR',
 'GET_MODULE_CODES_VERSION_MINOR',
 'GET_MODULE_CODES_YEAR',
 'GET_NODE_ADDRESS',
 'GET_OPTICAL_POWER_OFF',
 'GET_OUTPUT_125MHZ_LOCKED',
 'GET_OUTPUT_2GHZ_LOCKED',
 'GET_PATCH_LEVEL',
 'GET_POWER_SUPPLY_12V_NOT_OK',
 'GET_POWER_SUPPLY_15V_NOT_OK',
 'GET_PROTOCOL_MAJOR_REV_LEVEL',
 'GET_PROTOCOL_MINOR_REV_LEVEL',
 'GET_PROTOCOL_PATCH_LEVEL',
 'GET_PROTOCOL_REV_LEVEL',
 'GET_PWR_125_MHZ',
 'GET_PWR_25_MHZ',
 'GET_PWR_2_GHZ',
 'GET_READ_MODULE_CODES',
 'GET_RX_OPT_PWR',
 'GET_SERIAL_NUMBER',
 'GET_SIGN_OP',
 'GET_STATUS',
 'GET_SW_REV_LEVEL',
 'GET_TE_LENGTH',
 'GET_TE_LONG_FLAG_SET',
 'GET_TE_OFFSET_COUNTER',
 'GET_TE_SHORT_FLAG_SET',
 'GET_TRANS_NUM',
 'GET_VDC_12',
 'GET_VDC_15',
 'GET_VDC_7',
 'GET_VDC_MINUS_7',
 'SET_CLEAR_FLAGS',
 'SET_FPGA_LOGIC_RESET',
 'SET_RESET_AMBSI',
 'SET_RESET_DEVICE',
 'SET_RESYNC_TE',
 'STATUS',
 '_HardwareDevice__componentName',
 '_HardwareDevice__hw',
 '_HardwareDevice__stickyFlag',
 '_LORRBase__logger',
 '__del__',
 '__doc__',
 '__init__',
 '__module__',
 '_devices',
 'clearDeviceCommunicationErrorAlarm',
 'getControlList',
 'getDeviceCommunicationErrorCounter',
 'getErrorMessage',
 'getHwState',
 'getInternalSlaveCanErrorMsg',
 'getLastCanErrorMsg',
 'getMonitorList',
 'hwConfigure',
 'hwDiagnostic',
 'hwInitialize',
 'hwOperational',
 'hwSimulation',
 'hwStart',
 'hwStop',
 'inErrorState',
 'isMonitoring',
 'isSimulated']

In [6]:
martineau
  • 119,623
  • 25
  • 170
  • 301
Buoy
  • 795
  • 1
  • 8
  • 13
  • 1
    "these values are unpickleable" -- do you mean the things your packing into your globals are not able to be pickled? If that's the case, then you can't use subprocess (AFAIK) since that is how information is passed between processes. If the data is able to be pickled, you'll want to use a `Manager`. – mgilson Jun 15 '12 at 18:21
  • You're not really supposed to "post" anything but answers to your question. So it's good that you edited instead; that's the right thing to do in this case. – senderle Jun 15 '12 at 20:47
  • Another alternative I recently discovered is `apply_async` callback. The callback gets executed in the parent process. That means anything the subprocess returns, can be passed to the callback process, and the callback process can then mutate the globals. This however requires the usage of `global variableName` declaration at the top of the callback function. – CMCDragonkai Sep 29 '16 at 19:18

5 Answers5

56

When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.

Additionally, most of the abstractions that multiprocessing provides use pickle to transfer data. All data transferred using proxies must be pickleable; that includes all the objects that a Manager provides. Relevant quotations (my emphasis):

Ensure that the arguments to the methods of proxies are picklable.

And (in the Manager section):

Other processes can access the shared objects by using proxies.

Queues also require pickleable data; the docs don't say so, but a quick test confirms it:

import multiprocessing
import pickle

class Thing(object):
    def __getstate__(self):
        print 'got pickled'
        return self.__dict__
    def __setstate__(self, state):
        print 'got unpickled'
        self.__dict__.update(state)

q = multiprocessing.Queue()
p = multiprocessing.Process(target=q.put, args=(Thing(),))
p.start()
print q.get()
p.join()

Output:

$ python mp.py 
got pickled
got unpickled
<__main__.Thing object at 0x10056b350>

The one approach that might work for you, if you really can't pickle the data, is to find a way to store it as a ctype object; a reference to the memory can then be passed to a child process. This seems pretty dodgy to me; I've never done it. But it might be a possible solution for you.

Given your update, it seems like you need to know a lot more about the internals of a LORR. Is LORR a class? Can you subclass from it? Is it a subclass of something else? What's its MRO? (Try LORR.__mro__ and post the output if it works.) If it's a pure python object, it might be possible to subclass it, creating a __setstate__ and a __getstate__ to enable pickling.

Another approach might be to figure out how to get the relevant data out of a LORR instance and pass it via a simple string. Since you say that you really just want to call the methods of the object, why not just do so using Queues to send messages back and forth? In other words, something like this (schematically):

Main Process              Child 1                       Child 2
                          LORR 1                        LORR 2 
child1_in_queue     ->    get message 'foo'
                          call 'foo' method
child1_out_queue    <-    return foo data string
child2_in_queue                   ->                    get message 'bar'
                                                        call 'bar' method
child2_out_queue                  <-                    return bar data string
senderle
  • 145,869
  • 36
  • 209
  • 233
  • @user1459256, consider my edit (at the bottom of my post). We need more information about `LORR` objects to develop a feasible approach. – senderle Jun 15 '12 at 21:01
  • @user1459256, OK, when you say you need to call the methods later to get the data that is current when called -- it makes me think that you don't really need to transfer `LORR` objects at all, but that you rather need to transfer the data returned by a method on a `LORR` object. But that should be easy! Just use message passing via a `Queue` to tell the child process to call a particular method, and then have the child return a value via a return `Queue`. – senderle Jun 15 '12 at 21:36
  • @user1459256, see my most recent edit and let me know if the proposed solution there would work. – senderle Jun 15 '12 at 21:44
  • Sorry, I've been slow digesting all of this. The LORR object does not have an __mro__. It is a python interface to C++ code. I need the objects to persist for minutes to hours. I was thinking threads should not be managed for extended periods. Since I cannot pass the object out of the subprocess I would need to keep the subprocesses active for hours while I pass the results of the methods when called. Is this reasonable? If so it seems this would be a solution to the problems. – Buoy Jun 15 '12 at 22:18
  • 2
    @user1459256: Personally, I don't see why long-running subprocesses are much of an issue if you are in constant communication with them. (or long running threads) – jdi Jun 15 '12 at 22:28
  • @jdi, senderle: Thanks a lot for sticking with this. I will build a queue solution with keeping the subprocess active. This seems very promising! – Buoy Jun 15 '12 at 22:40
  • @senderle multiprocessing.Queue shows it is proxy type object – Elysiumplain May 06 '21 at 21:03
7

@DBlas gives you a quick url and reference to the Manager class in an answer, but I think its still a bit vague so I thought it might be helpful for you to just see it applied...

import multiprocessing
from multiprocessing import Manager

ants = ['DV03', 'DV04']

def getDV03CclDrivers(lib, data_dict):  
    data_dict[1] = 1
    data_dict[0] = 0

def getDV04CclDrivers(lib, data_list):   
    data_list['driver'] = 0  


if __name__ == "__main__":

    manager = Manager()
    dataDV03 = manager.list(['', ''])
    dataDV04 = manager.dict({'driver': '', 'status': ''})

    jobs = []
    if 'DV03' in ants:
        j = multiprocessing.Process(
                target=getDV03CclDrivers, 
                args=('LORR', dataDV03))
        jobs.append(j)

    if 'DV04' in ants:
        j = multiprocessing.Process(
                target=getDV04CclDrivers, 
                args=('LORR', dataDV04))
        jobs.append(j)

    for j in jobs:
        j.start()

    for j in jobs:
        j.join()

    print 'Results:\n'
    print 'DV03', dataDV03
    print 'DV04', dataDV04

Because multiprocessing actually uses separate processes, you cannot simply share global variables because they will be in completely different "spaces" in memory. What you do to a global under one process will not reflect in another. Though I admit that it seems confusing since the way you see it, its all living right there in the same piece of code, so "why shouldn't those methods have access to the global"? Its harder to wrap your head around the idea that they will be running in different processes.

The Manager class is given to act as a proxy for data structures that can shuttle info back and forth for you between processes. What you will do is create a special dict and list from a manager, pass them into your methods, and operate on them locally.

Un-pickle-able data

For your specialize LORR object, you might need to create something like a proxy that can represent the pickable state of the instance.

Not super robust or tested much, but gives you the idea.

class LORRProxy(object):

    def __init__(self, lorrObject=None):
        self.instance = lorrObject

    def __getstate__(self):
        # how to get the state data out of a lorr instance
        inst = self.instance
        state = dict(
            foo = inst.a,
            bar = inst.b,
        )
        return state

    def __setstate__(self, state):
        # rebuilt a lorr instance from state
        lorr = LORR.LORR()
        lorr.a = state['foo']
        lorr.b = state['bar']
        self.instance = lorr
jdi
  • 90,542
  • 19
  • 167
  • 203
  • But `Manager` objects use `pickle` to transfer data. So if the OP's data really is unpickleable, I think this won't work. – senderle Jun 15 '12 at 18:40
  • 1
    @senderle: That may be true, but then again the OP has not yet provided any example of what the unpickleable data is. I can only answer what I see :-) – jdi Jun 15 '12 at 18:42
  • Thanks for the detailed answers. According to the replies the deadlocks are 1) the values I need to pass to the parent return pickle errors and the subprocess separate spaces prevent passing back global variables. – Buoy Jun 15 '12 at 20:15
  • @user1459256: senderle shows an example of how to create a class structure that can be pickled. So you can combine that with either the Queue suggestion or with mine by adding them into manager dict and list proxy objects. – jdi Jun 15 '12 at 20:20
  • I cannot change the underlying code that I use to instantiate the object. It is deep in the ALMA observatory system and would take heroic effort to get someone to change it. Can an arbitrary class instantiation be serialized or converted to a ctype? – Buoy Jun 15 '12 at 20:51
  • @user1459256: You dont have to redefine the original class. You just need to define a way to serialize it. It just needs to save and restore the state of the object to basic data structures. You can have a class that takes in your special object and acts as the proxy to serialize it – jdi Jun 15 '12 at 20:55
  • I tried json but get this error: TypeError: is not JSON serializable – Buoy Jun 15 '12 at 21:01
  • @user1459256: Well yes you cant simply pass the object to a serializer because of the very nature that it is NOT serializable. You need to actually define what attributes of this object need to be saved into something like a dict or list, then serialize that. – jdi Jun 15 '12 at 21:05
  • @jdi: can you point me to an example how to implement the class as proxy for the serializing? – Buoy Jun 15 '12 at 21:10
  • @jdi: okay, I will try to work on saving the methods to a manager dict. – Buoy Jun 15 '12 at 21:12
  • @user1459256: I just added a rough example – jdi Jun 15 '12 at 21:14
  • @jdi: Hm, I need to recover the methods but these are also not picklable. This is new for me. I need to call the object methods fro the parent. Is this possible with multiprocessing? I can use the threading module but the performance is not much better than sequentially instantiating the objects. – Buoy Jun 15 '12 at 21:20
  • @jdi: Fantastic! Thanks a lot for the example. I will study and try to implement. – Buoy Jun 15 '12 at 21:24
  • @senderle, jdi: I need the object methods. I need to call the methods later to get the data that is current when called. Am I running myself into a dead end? – Buoy Jun 15 '12 at 21:33
  • @user1459256: You dont need to worry about the methods. Only the data that is custom for the state of that instance. When it gets unserialized and applied back to a `LORR` instance with the state, the methods are all there. – jdi Jun 15 '12 at 21:38
  • @jdi: Sorry, I need to add more details. I need to instantiate hundreds of these objects but the instantiation process has been getting very slow as the system matures. Time studies show that multiprocessing running across the 8 cores saves considerable time, but I need the objects instantiated in the subprocesses to take advantage of the speedup. The proxy appears to instantiate the object twice so will double the sequential time if I understand it correctly. – Buoy Jun 15 '12 at 22:02
  • @user1459256: Generally serializing an object and then deserializing it results in two instances. You start with a source, transform it into something can can go on the wire, then transform it back into an instance on the other end of the pipe. The only way you can save the second instantiation is to create it in the main process, and send just data from it over to the subprocess, which only uses that data then sends data back which can be applied back to the state of the original instance. If both the main and sub processes need object access then you don't have a choice with multiprocessing – jdi Jun 15 '12 at 22:25
6

When using multiprocess, the only way to pass objects between processes is to use Queue or Pipe; globals are not shared. Objects must be pickleable, so multiprocess won't help you here.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • 1
    multiprocess has a Manager for shuttling data around between processes. – jdi Jun 15 '12 at 18:30
  • are Queues again the only way to pass values to parent function or only for same level functions? . – erogol Nov 07 '13 at 13:41
  • The documentation (https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes) says "shared objects will be process and thread-safe". What does this mean exactly? Am I right to say that it means the respective process/thread will internally apply a lock before updating the value, then change to value, then release the lock. Right? – variable Nov 05 '19 at 04:47
5

You could also use a multiprocessing Array. This allows you to have a shared state between processes and is probably the closest thing to a global variable.

At the top of main, declare an Array. The first argument 'i' says it will be integers. The second argument gives the initial values:

shared_dataDV03 = multiprocessing.Array ('i', (0, 0)) #a shared array

Then pass this array to the process as an argument:

j = multiprocessing.Process(target=getDV03CclDrivers, args=('LORR',shared_dataDV03))

You have to receive the array argument in the function being called, and then you can modify it within the function:

def getDV03CclDrivers(lib,arr):  # call global variable
    arr[1]=1
    arr[0]=0

The array is shared with the parent, so you can print out the values at the end in the parent:

print 'DV03', shared_dataDV03[:]

And it will show the changes:

DV03 [0, 1]
Paul
  • 669
  • 8
  • 9
1

I use p.map() to spin off a number of processes to remote servers and print the results when they come back at unpredictable times:

Servers=[...]
from multiprocessing import Pool
p=Pool(len(Servers))
p.map(DoIndividualSummary, Servers)

This worked fine if DoIndividualSummary used print for the results, but the overall result was in unpredictable order, which made interpretation difficult. I tried a number of approaches to use global variables but ran into problems. Finally, I succeeded with sqlite3.

Before p.map(), open a sqlite connection and create a table:

import sqlite3
conn=sqlite3.connect('servers.db') # need conn for commit and close
db=conn.cursor()
try: db.execute('''drop table servers''')
except: pass
db.execute('''CREATE TABLE servers (server text, serverdetail text, readings     text)''')
conn.commit()

Then, when returning from DoIndividualSummary(), save the results into the table:

db.execute('''INSERT INTO servers VALUES (?,?,?)''',         (server,serverdetail,readings))
conn.commit()
return

After the map() statement, print the results:

db.execute('''select * from servers order by server''')
rows=db.fetchall()
for server,serverdetail,readings in rows: print serverdetail,readings

May seem like overkill but it was simpler for me than the recommended solutions.

elemakil
  • 3,681
  • 28
  • 53
Eva Smith
  • 11
  • 1