Incorrect number of running calls with ProcessPoolExecutor in Python

Question

In Python's concurrent.futures standard module, why does the number of running calls in a ProcessPoolExecutor is max_workers + 1 instead of max_workers like in a ThreadPoolExecutor? This happens only when the number of submitted calls is strictly greater than the number of pool worker processes.

The following Python code snippet which submits 8 calls to 2 workers in a ProcessPoolExecutor:

import concurrent.futures
import time


def call():
    while True:
        time.sleep(1)


if __name__ == "__main__":
    with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
        futures = [executor.submit(call) for _ in range(8)]
        time.sleep(5)

        for future in futures:
            print(future.running())

prints this (3 running calls; unexpected since there are 2 workers):

True
True
True
False
False
False
False
False

while using a ThreadPoolExecutor prints this (2 running calls; expected):

True
True
False
False
False
False
False
False

what is possible is that there's a race condition between the process launches (which take time) and the status, where with threads it's much faster. I mean: once the first one returned True, it can be False again. The snapshot of the states is not atomic. — Jean-François Fabre, Jun 13 '19 at 19:16
@Jean-FrançoisFabre I tried with a `time.sleep(3)` inbetween but it makes no difference. — Géry Ogam, Jun 13 '19 at 19:19
using a `sleep` before polling running state changes the number. I got 1 before, now I get 5... — Jean-François Fabre, Jun 13 '19 at 19:22
if you add some prints in call, you'll see that only 2 processes are running. The running state is probably not reliable. — Jean-François Fabre, Jun 13 '19 at 19:40
@Jean-FrançoisFabre So you think that the `Future.running` method is broken? — Géry Ogam, Jun 13 '19 at 19:42

Jean-François Fabre · Answer 1 · 2019-06-13T20:14:46.647

Well, I would not trust this running() method too much. Seems that it's not really reflecting the actual running state.

The best way to make sure of the process states is to make them print/update something. I've chosen to create a shared dictionary using a multiprocessing.Manager().dict() object.

This process-synchronized object can be consulted/updated safely from any process and has a shared state, even in multiprocessing environment.

Each time a process is started, update the shared dict with the PID as key and True as a value. Set False on exit.

import concurrent.futures
import multiprocessing
import time,os


def call(shared_dict):
    shared_dict[os.getpid()] = True
    print("start",shared_dict)
    time.sleep(10)
    shared_dict[os.getpid()] = False
    print("end",shared_dict)


if __name__ == "__main__":

    with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
        shared_dict = multiprocessing.Manager().dict()
        futures = [executor.submit(call,shared_dict) for _ in range(8)]
        time.sleep(5)
        for future in futures:
            print(future.running())

here's the output I'm getting:

start {3076: True}
start {9968: True, 3076: True}
True
True
True
True
True
False
False
False
end {9968: True, 3076: False}
start {9968: True, 3076: True}
end {9968: False, 3076: True}
start {9968: True, 3076: True}
end {9968: True, 3076: False}
start {9968: True, 3076: True}
end {9968: False, 3076: True}
start {9968: True, 3076: True}
end {9968: True, 3076: False}
start {9968: True, 3076: True}
end {9968: False, 3076: True}
start {9968: True, 3076: True}
end {9968: True, 3076: False}
end {9968: False, 3076: False}

As you can see, I have 5 running processes. Whereas my dictionary clearly shows that

no more than 2 processes are running at the same time
the processes are created just once at the start, then reused to execute further calls (it's a pool, after all)

Let's check the very minimalist documentation:

running() Return True if the call is currently being executed and cannot be cancelled.

It seems to reflect a state related to a possibility of cancelling the Future object future execution (because it hasn't been initialized properly yet/connected to the communication queue and it's still time to cancel it) rather an actual "running" status of the process itself.

That's probably what this comment in the source code means below set_running_or_notify_cancel definition:

Mark the future as running or process any cancel notifications.

If the future has been cancelled (cancel() was called and returned True) then any threads waiting on the future completing (though calls to as_completed() or wait()) are notified and False is returned.

If the future was not cancelled then it is put in the running state (future calls to running() will return True) and True is returned.

Once again, we learn that it's better to ask subprocesses to collaborate, publishing their status, rather than trying to extort it using unclearly documented methods.

Thanks for the sample code proving that only 2 calls are running. However to me it looks like a bug in the `Future.running` method implementation which returns inconsistent states. If you sleep for 5 seconds before checking the future statuses (like in your example), you are sure that all futures are in the pending state except 2 which should be in the running state. So the `Future.running` method should not return 3, 5, or any number different than 2. Otherwise why exposing such a useless method in the first place? I filed a bug in the Python Bug Tracker: https://bugs.python.org/issue37276. — Géry Ogam, Jun 15 '19 at 11:42
I don't think it's useless. good call about the bug report. I'm interested in the results. The code is non-trivial to say the least. — Jean-François Fabre, Jun 15 '19 at 20:55
I have found a nastier bug: https://stackoverflow.com/questions/56609847 — Géry Ogam, Jun 16 '19 at 19:11

Incorrect number of running calls with ProcessPoolExecutor in Python

1 Answers1