I am trying to distribute jobs over several CUDA devices where the total number of running jobs at any time should be less than or equal to the number of cpu cores available. To do this, I determine the number of available 'slots' on each device and create a list that holds the available slots. If I have 6 cpu cores, and two cuda devices (0 and 1), then AVAILABLE_SLOTS = [0, 1, 0, 1, 0, 1]. In my worker function I pop the list and save it to a variable, set CUDA_VISIBLE_DEVICES env var in the subprocess call, and then append it back to the list. This has been working so far but I want to avoid race conditions.
Current code is as follows:
def work(cmd):
slot = AVAILABLE_GPU_SLOTS.pop()
exit_code = subprocess.call(cmd, shell=False, env=dict(os.environ, CUDA_VISIBLE_DEVICES=str(slot)))
AVAILABLE_GPU_SLOTS.append(slot)
return exit_code
if __name__ == '__main__':
pool_size = multiprocessing.cpu_count()
mols_to_be_run = [name for name in os.listdir(YANK_FILES) if os.path.isdir(os.path.join(YANK_FILES, name))]
cmds = build_cmd(mols_to_be_run)
cuda = get_cuda_devices()
AVAILABLE_GPU_SLOTS = build_available_gpu_slots(pool_size, cuda)
pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=2, )
pool.map(work, cmds)
Can I simply declare lock = multiprocessing.Lock() at the same level as AVAILABLE_GPU_SLOTS, put it in cmds, and then inside work() do
with lock:
slot = AVAILABLE_GPU_SLOTS.pop()
# subprocess stuff
with lock:
AVAILABLE_GPU_SLOTS.append(slot)
or do I need a manager list. Alternatively maybe there's a better solution to what I'm doing.