6

My CPU count is 8. That means I can theoretically have 16 threads to run my multi-threaded program. I have few questions.

  1. What happens if I create 20 threads and start them at the same time? Since I cannot have that much of threads due to hardware limitations, does OS handle it or do I have to handle it from my side?
  2. Even though there are 16 theoretical threads, some threads may be already utilized by other programs. Is there a way to get the "available to utilize thread count" in Python and dynamically utilize the maximum possible thread count?
Pradeep Sanjeewa
  • 1,911
  • 1
  • 14
  • 25

2 Answers2

5

My CPU count is 8.

You may want to check whether those are logical CPUs or physical CPUs.

That means I can theoretically have 16 threads to run my multi-threaded program.

No, you can have as many threads as you please (within reason; if you create thousands of threads, things may not go very well). The operating system will schedule them onto physical (or logical) CPUs as required.

What happens if I create 20 threads and start them at the same time? Since I cannot have that much of threads due to hardware limitations, does OS handle it or do I have to handle it from my side?

The operating system handles it. However, the operating system has to decide which threads will run and in which order, and you may not agree with the choices the operating system makes, so creating too many threads may be counterproductive. Also, switching between threads carries an inherent overhead, so you usually do not want to create more threads than there are logical CPUs, if your work is CPU-bound.

Even though there are 16 theoretical threads, some threads may be already utilized by other programs. Is there a way to get the "available to utilize thread count" in Python and dynamically utilize the maximum possible thread count?

Here we run into the problem: Python has a global interpreter lock, so the only correct answer for "how many threads can I usefully create?" (as opposed to "how many threads will Python and the operating system allow me to create?") is one. If you create multiple threads, only one thread can execute Python bytecode at a time. The others will have to wait for the lock, and won't be able to do anything useful.

The purpose of Python's threads is not to do work on multiple CPUs. Instead, they are intended for multiplexing I/O. That is, you can start I/O operations (such as reading or writing to a file, network socket, pipe, or other IPC mechanism) on as many threads as you like, and all of these I/O operations will run in parallel. Python releases the GIL when you perform an I/O operation, so it will not prevent this sort of parallelism. This is useful if you are trying to write some sort of server. In this use-case, you either create one thread per I/O operation (if you don't need too many) or you create a thread pool which dynamically allocates work items to worker threads, for example with concurrent.futures.ThreadPoolExecutor.

Kevin
  • 28,963
  • 9
  • 62
  • 81
  • Hi @Kevin, I used 'os.cpu_count()'. Does it provide the number of logical CPUs or physical CPUs? – Pradeep Sanjeewa Mar 09 '19 at 04:14
  • @PradeepSanjeewa: Based on https://stackoverflow.com/q/38194951/1340389 I believe the "correct" answer is logical CPUs, but apparently on Windows it [may just return a completely incorrect number](https://bugs.python.org/issue33166). – Kevin Mar 09 '19 at 04:18
1

You are mixing hardware-side hyper-threading and software-side threading. The first basically emulates more CPU cores than you have. But it has nothing to do with what we call threads in software programming.

Threads (the software ones) are not like a resource that a computer has and that can be assigned to a process. Threads are like processes, but they share the address space of their parent process. So they can access the same variables - different processes usually can't do that.

So as you can open a text editor 20 times, you can also open a new thread 20 times. Nevertheless, because you can does not mean that you should: https://stackoverflow.com/a/481979/8575607

Further reading: Maximum number of threads per process in Linux?


Edit: Adding to kevins answer: There are still reasons to use multiple threads (e.g. if you access software and draw a UI at the same time), the GIL is not taking usefulness from that. Or for e.g. rendering non-blocking UI overlays. The threads are still executed in a parallel way, although no two atomar commands in one cpython process are executed at the same time. (This is btw. not a comment, because I do not yet have enough reputation to comment under other people's posts)

2xB
  • 296
  • 1
  • 11