Python multithreading function argument

Question

I was writing some multithreading code and had a syntax issue in my code and found that the code was not executing in parallel but rather sequentially. I fixed the issue to pass the arguments to the function as a separate list instead of passing it as a parameter to the function but I couldn't figure out why python was behaving that way and couldn't find documentation for it. Anyone know why?

import time
from concurrent.futures import ThreadPoolExecutor

def do_work(i):
    print("{} {} - Command started".format(i, time.time()))
    time.sleep(1)

count = 0
executor = ThreadPoolExecutor(max_workers=2)
while count < 5:
    print("Starting work")
    executor.submit(do_work(count))
    print("Work submitted")
    count += 1

Fixed this line to make it go parallel.

    executor.submit(do_work, count)

Martijn Pieters · Accepted Answer · 2019-09-17T17:42:33.007

You were telling Python to execute the function do_work(), and to then pass whatever that function returned, to executor.do_work():

executor.submit(do_work(count))

It might be easier for you to see this if you used a variable to hold the result of do_work(). The following is functionally equivalent to the above:

do_work_result = do_work(count)
executor.submit(do_work_result)

In Python, functions are first-class objects; using just the name do_work you are referencing the function object. Only adding (...) to an expression that produces a function object (or another callable object type) causes something to be executed.

In the form

executor.submit(do_work, count)

you do not call the function. You are passing in the function object itself as the first argument, and count as the second argument. The executor.submit() function accepts callable objects and their arguments to then later on run those functions in parallel, with the arguments provided.

This allows the ThreadPoolExecutor to take that function reference and the single argument and only call the function in a new thread, later on.

Because you were calling the function first, you had to wait for each function to complete first as you called it sequentially before adding. And because the functions return None, you were adding those None references to executor.submit(), and would have seen a TypeError exception later on to tell you that 'NoneType' object is not callable. That happens because the threadpool executor tried to use None(), which doesn't work because indeed, None is not a callable.

Under the hood, the library essentially does this:

def submit(self, fn, *args, **kwargs):
    # record the function to be called as a work item, with other information
    w = _WorkItem(..., fn, args, kwargs)
    self._work_queue.put(w)

so a work item referencing the function and arguments is added to a queue. Worker threads are created which take items from the queue again it is taken from the queue (in another thread, or a child process), the _WorkItem.run() method is called, which runs your function:

result = self.fn(*self.args, **self.kwargs)

Only then the (...) call syntax is used. Because there are multiple threads, the code is executed concurrently.

You do want to read up on how pure Python code can't run in parallel, only concurrently: Does Python support multithreading? Can it speed up execution time?

Your do_work() functions only run 'faster' because time.sleep() doesn't have to do any actual work, apart from telling the kernel to not give any execution time to the thread the sleep was executed on, for the requested amount of time. You end up with a bunch of threads all asleep. If your workers had to execute Python instructions, then the total time spent on running these functions concurrently or sequentially would not differ all that much.

Python multithreading function argument

1 Answers1