My CPU count is 8.
You may want to check whether those are logical CPUs or physical CPUs.
That means I can theoretically have 16 threads to run my multi-threaded program.
No, you can have as many threads as you please (within reason; if you create thousands of threads, things may not go very well). The operating system will schedule them onto physical (or logical) CPUs as required.
What happens if I create 20 threads and start them at the same time? Since I cannot have that much of threads due to hardware limitations, does OS handle it or do I have to handle it from my side?
The operating system handles it. However, the operating system has to decide which threads will run and in which order, and you may not agree with the choices the operating system makes, so creating too many threads may be counterproductive. Also, switching between threads carries an inherent overhead, so you usually do not want to create more threads than there are logical CPUs, if your work is CPU-bound.
Even though there are 16 theoretical threads, some threads may be already utilized by other programs. Is there a way to get the "available to utilize thread count" in Python and dynamically utilize the maximum possible thread count?
Here we run into the problem: Python has a global interpreter lock, so the only correct answer for "how many threads can I usefully create?" (as opposed to "how many threads will Python and the operating system allow me to create?") is one. If you create multiple threads, only one thread can execute Python bytecode at a time. The others will have to wait for the lock, and won't be able to do anything useful.
The purpose of Python's threads is not to do work on multiple CPUs. Instead, they are intended for multiplexing I/O. That is, you can start I/O operations (such as reading or writing to a file, network socket, pipe, or other IPC mechanism) on as many threads as you like, and all of these I/O operations will run in parallel. Python releases the GIL when you perform an I/O operation, so it will not prevent this sort of parallelism. This is useful if you are trying to write some sort of server. In this use-case, you either create one thread per I/O operation (if you don't need too many) or you create a thread pool which dynamically allocates work items to worker threads, for example with concurrent.futures.ThreadPoolExecutor
.