59

I just had a quick question on how processors and threads work. According to my current understanding, a core can only perform 1 process at a time. But we are able to produce a thread pool(lets say 30) with a larger number than the number of cores that we posses(lets say 4) and have them run concurrently. How is this possible if we are only have 4 cores? I am also able to run my 30 thread program on my local computer and also continue to perform other activities on my computer such as watch movies or browse the internet.

I have read somewhere that scheduling of threads occurs and that sort of gives the illusion that these 30 threads are running concurrently by the 4 cores. Is this true and if so can someone explain how this works and also recommend some good reading on this?

Thank you in advance for the help.

user5765683
  • 595
  • 1
  • 5
  • 5
  • 4
    Wow thank you. All of these are such great responses and has really helped in understanding this subject better. – user5765683 Jan 10 '16 at 03:36

4 Answers4

106

Processes vs Threads

In days of old, each process had precisely one thread of execution, so processes were scheduled onto cores directly (and in these old days, there was almost only one core to schedule onto). However, in operating systems that support threading (which is almost all moderns OS's), it is threads, not processes that are scheduled. So for the rest of this discussion we will talk exclusively about threads, and you should understand that each running process has one or more threads of execution.

Parallelism vs Concurrency

When two threads are running in parallel, they are both running at the same time. For example, if we have two threads, A and B, then their parallel execution would look like this:

CPU 1: A ------------------------->

CPU 2: B ------------------------->

When two threads are running concurrently, their execution overlaps. Overlapping can happen in one of two ways: either the threads are executing at the same time (i.e. in parallel, as above), or their executions are being interleaved on the processor, like so:

CPU 1: A -----------> B ----------> A -----------> B ---------->

So, for our purposes, parallelism can be thought of as a special case of concurrency*

Scheduling

But we are able to produce a thread pool(lets say 30) with a larger number than the number of cores that we posses(lets say 4) and have them run concurrently. How is this possible if we are only have 4 cores?

In this case, they can run concurrently because the CPU scheduler is giving each one of those 30 threads some share of CPU time. Some threads will be running in parallel (if you have 4 cores, then 4 threads will be running in parallel at any one time), but all 30 threads will be running concurrently. The reason you can then go play games or browse the web is that these new threads are added to the thread pool/queue and also given a share of CPU time.

Logical vs Physical Cores

According to my current understanding, a core can only perform 1 process at a time

This is not quite true. Due to very clever hardware design and pipelining that would be much too long to go into here (plus I don't understand it), it is possible for one physical core to actually be executing two completely different threads of execution at the same time. Chew over that sentence a bit if you need to -- it still blows my mind.

This amazing feat is called simultaneous multi-threading (or popularly Hyper-Threading, although that is a proprietary name for a specific instance of such technology). Thus, we have physical cores, which are the actual hardware CPU cores, and logical cores, which is the number of cores the operating system tells software is available for use. Logical cores are essentially an abstraction. In typical modern Intel CPUs, each physical core acts as two logical cores.

can someone explain how this works and also recommend some good reading on this?

I would recommend Operating System Concepts if you really want to understand how processes, threads, and scheduling all work together.

  • The precise meanings of the terms parallel and concurrent are hotly debated, even here in our very own stack overflow. What one means by these terms depends a lot on the application domain.
Community
  • 1
  • 1
gardenhead
  • 2,299
  • 2
  • 19
  • 17
24

Java do not perform Thread scheduling, it leaves this on Operating System to perform Thread scheduling.

For computationally intensive tasks, It is recommended to have thread pool size equal to number of cores available. But for I/O bound tasks we should have larger number of threads. There are many other variations, if both type of tasks are available and needs CPU time slice.

a core can only perform 1 process at a time

Yes, but they can multitask and create an illusion that they are processing more than one process at a time

How is this possible if we are only have 4 cores? I am also able to run my 30 thread program on my local computer and also continue to perform other activities on my computer

This is possible due to multitasking (which is concurrency). Lets say you started 30 threads and OS is also running 50 threads, all 80 threads will share 4 CPU cores by getting CPU time slice one by one (one thread per core at a time). Which means on average each core will run 80/4=20 threads concurrently. And you will feel all threads/processes are running at the same time.

can someone explain how this works

All of this happens at OS level. If you are a programmer then you should not worry about this. But if you are a student of OS then pick any OS book & learn more about Multi-threading at OS level in detail or find some good research paper for depth. One thing you should know that each OS handle these things in different way (but generally concepts are same)

There are some languages like Erlang, which use green threads (or processes), due to which they get the ability to map and schedule threads on their own eliminating OS. So, do some research on green threads as well if you are interested.

Note: You can also research on actors which is another abstraction over threads. Languages like Erlang, Scala etc use actors to accomplish tasks. One thread can have hundred of actors; each actor can perform different task (similar to threads in java).

This is a very vast and active research topic and there are many things to learn.

12

In short, your understanding of a core is correct. A core can execute 1 thread (aka process) at a time.

However, your program doesn't really run 30 threads at once. Of those 30 threads, only 4 are running at a time, and the other 26 are waiting. The CPU will schedule threads and give each thread a slice of time to run on a core. So the CPU will make all the threads take turns running.

A common misconception:

Having more threads will make my program run faster.

FALSE: Having more threads will NOT always make your program run faster. It just means the CPU has to do more switching, and in fact, having too many threads will make your program run slower because of the overhead caused by switching out all the different processes.

Andy Guibert
  • 41,446
  • 8
  • 38
  • 61
2

I'd like to add to the excellent answer of gardenhead.

As far as I know, even in simultaneous multithreading (Intel's Hyperthreading is an implementation thereof), the threads running on a specific core need to be from the same process. The reason is that each process has its own virtual address space. When these threads want to access memory, they do so with the virtual address. However, if the threads are from different processes, they use different virtual addresses to refer to the same physical address (this is what it means to have a different virtual address space). Hence this cannot work.

Let us now talk about a specific implementation of simultaneous multi-threading: Hyperthreading. This allows each core to have two threads scheduled at the same time (from the same process). It is achieved by simply adding a second set of architectural registers. Keep in mind that these include special registers like the instruction pointer and stack pointer. Now, in each clock cycle, the core can decide which thread to fetch an instruction from, since it has all the information it needs to execute instructions for any of the two threads:

  • thanks to the instruction pointers, it knows which instruction it needs to fetch
  • thanks to the stack pointer, it knows where to place variables when the registers are full
  • assuming that the threads are in fact from the same process: they share a heap and hence no matter which thread the instruction is from, the core also knows where to place larger datastructures

Usually the core switches to instructions from the other thread when a memory access occurs or data dependencies would force a no-op. Hence, the second thread can reduce the bubbles in the cpu pipeline. This of course means we cannot expect 2x speedup - but we can get more throughput for the small cost of adding a set of architectural registers and adding some logic to decide which instruction pointer and so on we need to choose. Intel talks of roughly 20% speedup depending on the application.

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
sangon-g
  • 21
  • 2