I'm working on a project, using Python (3) multiprocessing, that requires frequent synchronization between processes, specifically a barrier. The multiprocessing barrier is too slow, but luckily using semaphores to perform the barrier function is much faster. The implementation looks something like this:
# Initialize.
semaphore_1 = multiprocessing.Semaphore(n - 1)
semaphore_2 = multiprocessing.Semaphore(0)
# Inside each of the n parallel workers
for i in range(n_itr):
if semaphore_1.acquire(block=False):
semaphore_2.acquire()
else: # Only the last one ends up here.
# Charge up the outer semaphore to value = n - 1.
[semaphore_1.release() for _ in range(n - 1)]
# Let everyone else pass the inner semaphore, to value = 0.
[semaphore_2.release() for _ in range(n - 1)]
And that seems to keep the processes together.
There is a tip here, down under "Loops", which suggests it might be faster to use the python builtin map
to make the multiple calls to release. But when I try:
map(lambda x: semaphore_1.release(), range(n - 1))
in place of the list comprehension, it doesn't work. The output can be rather confusing, but generally it doesn't make it far past the first iteration.
Is there something funny going on between multiprocessing and map
? Or am I not using map
correctly?
EDIT: Watch out, the above loop doesn't work exactly as is. It needs a guaranteed delay between entries in subsequent loops, so that all threads are released from semaphore_2 before any new ones acquire it (can be accomplished by alternating with another pair of semaphores).