1

I have realized that my multithreading program isn't doing what I think its doing. The following is a MWE of my strategy. In essence I'm creating nThreads threads but only actually using one of them. Could somebody help me understand my mistake and how to fix it?

import threading
import queue

NPerThread = 100
nThreads = 4

def worker(q: queue.Queue, oq: queue.Queue):
    while True:
        l = []
        threadIData = q.get(block=True)
        for i in range(threadIData["N"]):
            l.append(f"hello {i} from thread {threading.current_thread().name}")
        oq.put(l)
        q.task_done()

threadData = [{} for i in range(nThreads)]
inputQ = queue.Queue()
outputQ = queue.Queue()

for threadI in range(nThreads):
    threadData[threadI]["thread"] = threading.Thread(
        target=worker, args=(inputQ, outputQ),
        name=f"WorkerThread{threadI}"
    )
    threadData[threadI]["N"] = NPerThread
    threadData[threadI]["thread"].setDaemon(True)
    threadData[threadI]["thread"].start()


for threadI in range(nThreads):
    # start and end are in units of 8 bytes.
    inputQ.put(threadData[threadI])

inputQ.join()

outData = [None] * nThreads
count = 0
while not outputQ.empty():
    outData[count] = outputQ.get()
    count += 1


for i in outData:
    assert len(i) == NPerThread
    print(len(i))
print(outData)

edit

I only actually realised that I had made this mistake after profiling. Here's the output, for information: enter image description here

CiaranWelsh
  • 7,014
  • 10
  • 53
  • 106
  • Do I understand you correctly in that multiple CPU cores aren't actually running the threads concurrently? In that case this might be a limitation of Python; see also [this SO thread]( https://stackoverflow.com/a/1294402/8484932) and [this wikipedia page]( https://en.wikipedia.org/wiki/Global_interpreter_lock). If you want to use multiple processes, you could look at the `multiprocessing` library. – jrbergen Oct 18 '22 at 10:42
  • The threads are being run concurrently (by the looks of it anyway). The problem is that three threads are just sitting there under lock whilst one thread does all the work. Maybe this is GIL related, but I really don't know. Thanks for the suggestion, I should maybe try processes rather than threads. – CiaranWelsh Oct 18 '22 at 10:54

1 Answers1

3

In your sample program, the worker function is just executing so fast that the same thread is able to dequeue every item. If you add a time.sleep(1) call to it, you'll see other threads pick up some of the work.

However, it is important to understand if threads are the right choice for your real application, which presumably is doing actual work in the worker threads. As @jrbergen pointed out, because of the GIL, only one thread can execute Python bytecode at a time, so if your worker functions are executing CPU-bound Python code (meaning not doing blocking I/O or calling a library that releases the GIL), you're not going to get a performance benefit from threads. You'd need to use processes instead in that case.

I'll also note that you may want to use concurrent.futures.ThreadPoolExecutor or multiprocessing.dummy.ThreadPool for an out-of-the-box thread pool implementation, rather than creating your own.

dano
  • 91,354
  • 19
  • 222
  • 219
  • Thanks for helping me understand. I've switched to processes now -- I'll get to work with the profiler to see if it actually makes a difference... – CiaranWelsh Oct 18 '22 at 14:09