Python multiprocessing using a lock or manager list for Pool workers accessing a global list variable

Question

I am trying to distribute jobs over several CUDA devices where the total number of running jobs at any time should be less than or equal to the number of cpu cores available. To do this, I determine the number of available 'slots' on each device and create a list that holds the available slots. If I have 6 cpu cores, and two cuda devices (0 and 1), then AVAILABLE_SLOTS = [0, 1, 0, 1, 0, 1]. In my worker function I pop the list and save it to a variable, set CUDA_VISIBLE_DEVICES env var in the subprocess call, and then append it back to the list. This has been working so far but I want to avoid race conditions.

Current code is as follows:

def work(cmd):
    slot = AVAILABLE_GPU_SLOTS.pop()
    exit_code = subprocess.call(cmd, shell=False, env=dict(os.environ, CUDA_VISIBLE_DEVICES=str(slot)))
    AVAILABLE_GPU_SLOTS.append(slot)
    return exit_code

if __name__ == '__main__':
    pool_size = multiprocessing.cpu_count()
    mols_to_be_run = [name for name in os.listdir(YANK_FILES) if os.path.isdir(os.path.join(YANK_FILES, name))]
    cmds = build_cmd(mols_to_be_run)
    cuda = get_cuda_devices()
    AVAILABLE_GPU_SLOTS = build_available_gpu_slots(pool_size, cuda)
    pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=2, )
    pool.map(work, cmds)

Can I simply declare lock = multiprocessing.Lock() at the same level as AVAILABLE_GPU_SLOTS, put it in cmds, and then inside work() do

with lock:
    slot = AVAILABLE_GPU_SLOTS.pop()
# subprocess stuff
with lock:
    AVAILABLE_GPU_SLOTS.append(slot)

or do I need a manager list. Alternatively maybe there's a better solution to what I'm doing.

Have you thought about creating your own GPU slot pool using a semaphore and a list of slots available? Implement it as a context manager such that you can use `with gpu_slot_pool.get() as gpu_slot:` — sethmlarson, Aug 10 '16 at 21:09
Oh and if you're using a Lock, use the `threading.Lock` object, not multiprocessing. The multiprocessing Lock is made with named semaphores which might not be available on all platforms. — sethmlarson, Aug 10 '16 at 21:11
For now, at least, this code will only run on a host running Ubuntu 16.04, so without really knowing anything about the details, i'd imagine it would be available. — jlerche, Aug 10 '16 at 21:36
I haven't tried implementing my own slot pool as I am not at all familiar with the multiprocessing or threading modules so I wouldn't even know where to start. — jlerche, Aug 10 '16 at 21:48

score 1 · Accepted Answer · edited May 23 '17 at 12:08

Basing off of what I found in the following SO answer Python sharing a lock between processes:

Using a regular list leads to each process having its own copy, as is expected. Using a manager list seems to be sufficient enough to get around that. Example code:

def doing_work(honk):
    proc = multiprocessing.current_process()
    # with lock:
    #     print proc, 'about to pop SLOTS_LIST', SLOTS_LIST
    #     slot = SLOTS_LIST.pop()
    #     print multiprocessing.current_process(), ' just popped', slot, 'from', SLOTS_LIST
    print proc, 'about to pop SLOTS_LIST', SLOTS_LIST
    slot = SLOTS_LIST.pop()
    print multiprocessing.current_process(), ' just popped', slot, 'from SLOTS_LIST'
    time.sleep(10)

def init(l):
    global lock
    lock = l

if __name__ == '__main__':
    man = multiprocessing.Manager()
    SLOTS_LIST = [1,34,3465,456,4675,6,4]
    SLOTS_LIST = man.list(SLOTS_LIST)
    l = multiprocessing.Lock()
    pool = multiprocessing.Pool(processes=2, initializer=init, initargs=(l,))
    inputs = range(len(SLOTS_LIST))
    pool.map(doing_work, inputs)

which outputs

<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456, 4675, 6, 4]
<Process(PoolWorker-3, started daemon)>  just popped 4 from SLOTS_LIST
<Process(PoolWorker-2, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456, 4675, 6]
<Process(PoolWorker-2, started daemon)>  just popped 6 from SLOTS_LIST
<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456, 4675]
<Process(PoolWorker-3, started daemon)>  just popped 4675 from SLOTS_LIST
<Process(PoolWorker-2, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456]
<Process(PoolWorker-2, started daemon)>  just popped 456 from SLOTS_LIST
<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1, 34, 3465]    
<Process(PoolWorker-3, started daemon)>  just popped 3465 from SLOTS_LIST
<Process(PoolWorker-2, started daemon)> about to pop SLOTS_LIST [1, 34]
<Process(PoolWorker-2, started daemon)>  just popped 34 from SLOTS_LIST
<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1]
<Process(PoolWorker-3, started daemon)>  just popped 1 from SLOTS_LIST

which is desired behavior. I'm not sure if it completely eliminates race conditions but it seems to be good enough. That and using a lock on top of it is simple enough.

Python multiprocessing using a lock or manager list for Pool workers accessing a global list variable

1 Answers1