What are the differences between multi-CPU, multi-core and hyper-thread?

Question

Could anyone explain to me the differences between multi-CPU, multi-core, and hyper-thread? I am always confused about these differences, and about the pros/cons of each architecture in different scenarios.

Here is my current understanding after learning online and learning from others' comments.

I think hyper-thread is the most inferior technology among them, but cheap. Its main idea is duplicate registers to save context switch time;
Multi processor is better than hyper-thread, but since different CPUs are on different chips, the communication between different CPUs is of longer latency than multi-core, and using multiple chips, there is more expense and more power consumption than with multi-core;
multi-core integrates all the CPUs on a single chip, so the latency of communication between different CPUs are greatly reduced compared with multi-processor. Since it uses one single chip to contain all CPUs, it consumer less power and is less expensive than a multi processor system.

Is this correct?

Hyperthreading is not inferior. It is quite useful, particularly for servers. There are diminishing returns from ILP (keeping processor busy by rearranging sequential instructions). Hyperthreading is an alternative to boost parallelism: multiple hardware threads execute without much overhead. — amit kumar, Mar 26 '09 at 03:29
How about my other points about multi-core and multi-processor, do you think my points are correct? Anything wrong? — George2, Mar 26 '09 at 04:58
@George2 - Your edit is very true. That is the whole idea. :) The best thing you can get on a server is probably a multi-core multi-CPU, but for usual usage multi-core is the best shot! — Bogdan Constantinescu, Mar 26 '09 at 14:35
Thanks Bogdan, with your confirmation, I am more confident! :-) — George2, Mar 27 '09 at 13:06
That's a terrible description of hyperthreading. The main point is to give up some per-thread performance to gain higher total throughput, with only a bit of extra hardware in the front-end of an out-of-order core. See [this Q&A](http://stackoverflow.com/questions/35748305/is-duplication-of-state-resources-considered-optimal-for-hyper-threading) asking about a similar paragraph in the accepted answer (before my edit that fixed it). — Peter Cordes, Oct 11 '16 at 18:33

score 89 · Accepted Answer · edited May 23 '17 at 12:26

Multi-CPU was the first version: You'd have one or more mainboards with one or more CPU chips on them. The main problem here was that the CPUs would have to expose some of their internal data to the other CPU so they wouldn't get in their way.

The next step was hyper-threading. One chip on the mainboard but it had some parts twice internally so it could execute two instructions at the same time.

The current development is multi-core. It's basically the original idea (several complete CPUs) but in a single chip. The advantage: Chip designers can easily put the additional wires for the sync signals into the chip (instead of having to route them out on a pin, then over the crowded mainboard and up into a second chip).

Super computers today are multi-cpu, multi-core: They have lots of mainboards with usually 2-4 CPUs on them, each CPU is multi-core and each has its own RAM.

[EDIT] You got that pretty much right. Just a few minor points:

Hyper-threading keeps track of two contexts at once in a single core, exposing more parallelism to the out-of-order CPU core. This keeps the execution units fed with work, even when one thread is stalled on a cache miss, branch mispredict, or waiting for results from high-latency instructions. It's a way to get more total throughput without replicating much hardware, but if anything it slows down each thread individually. See this Q&A for more details, and an explanation of what was wrong with the previous wording of this paragraph.
The main problem with multi-CPU is that code running on them will eventually access the RAM. There are N CPUs but only one bus to access the RAM. So you must have some hardware which makes sure that a) each CPU gets a fair amount of RAM access, b) that accesses to the same part of the RAM don't cause problems and c) most importantly, that CPU 2 will be notified when CPU 1 writes to some memory address which CPU 2 has in its internal cache. If that doesn't happen, CPU 2 will happily use the cached value, oblivious to the fact that it is outdated

Just imagine you have tasks in a list and you want to spread them to all available CPUs. So CPU 1 will fetch the first element from the list and update the pointers. CPU 2 will do the same. For efficiency reasons, both CPUs will not only copy the few bytes into the cache but a whole "cache line" (whatever that may be). The assumption is that, when you read byte X, you'll soon read X+1, too.

Now both CPUs have a copy of the memory in their cache. CPU 1 will then fetch the next item from the list. Without cache sync, it won't have noticed that CPU 2 has changed the list, too, and it will start to work on the same item as CPU 2.

This is what effectively makes multi-CPU so complicated. Side effects of this can lead to a performance which is worse than what you'd get if the whole code ran only on a single CPU. The solution was multi-core: You can easily add as many wires as you need to synchronize the caches; you could even copy data from one cache to another (updating parts of a cache line without having to flush and reload it), etc. Or the cache logic could make sure that all CPUs get the same cache line when they access the same part of real RAM, simply blocking CPU 2 for a few nanoseconds until CPU 1 has made its changes.

[EDIT2] The main reason why multi-core is simpler than multi-cpu is that on a mainboard, you simply can't run all wires between the two chips which you'd need to make sync effective. Plus a signal only travels 30cm/ns tops (speed of light; in a wire, you usually have much less). And don't forget that, on a multi-layer mainboard, signals start to influence each other (crosstalk). We like to think that 0 is 0V and 1 is 5V but in reality, "0" is something between -0.5V (overdrive when dropping a line from 1->0) and .5V and "1" is anything above 0.8V.

If you have everything inside of a single chip, signals run much faster and you can have as many as you like (well, almost :). Also, signal crosstalk is much easier to control.

Your notion of hyper-threading can be a bit misleading, as hyperthreading "just" simulates parallel execution of multiple threads - but mainly tries to improve multi-threaded performance by means of built-in CPU logic. — J.C. Inacio, Mar 25 '09 at 08:54
@jcinacio, does hyper-threading improves multi process performance? Why? — George2, Mar 26 '09 at 03:15
@Aaron, 1. I have editted my current points in my original post after learning from you. Could you help to review and comment please? 2. What means "expose some of their internal data to the other CPU so they wouldn't get in their way." in your post? — George2, Mar 26 '09 at 03:18
@Aaron, your reply so excellent, my last question, why do you say multi-core CPU solves the issue of CPU status synchornization/wait for RAM issues? I think if code logics are of the same, the synchronization and wait for RAM issue still exist. Any comments? — George2, Mar 26 '09 at 11:37
@Aaron, great reply! I want to confirm with you that multi-core system still need to handle issues like synchronization of cache in different CPUs and wait for RAM issues -- just the same issues of multi-processor system, multi-core just make better performance for handling such issues. Correct? — George2, Mar 26 '09 at 13:16
can you compare hyper-threading with superscaler architecture ? — Amit P, May 30 '16 at 11:25
@AmitP No; you can probably ask a new question on http://programmers.stackexchange.com/ — Aaron Digulla, May 30 '16 at 15:49
@AmitP please note that [too broad](http://meta.programmers.stackexchange.com/questions/6483/why-was-my-question-closed-or-down-voted/6490#6490) questions tend to be voted down and closed at Programmers, see **[What goes on Programmers.SE? A guide for Stack Overflow](http://meta.programmers.stackexchange.com/q/7182/31260)** — gnat, May 30 '16 at 16:01

score 4 · Answer 2 · answered Mar 25 '09 at 08:46

4

You can find some interesting articles about dual CPU, multi-core and hyper-threading on Intel's website or in a short article from Yale University.

I hope you find here all the information you need.

answered Mar 25 '09 at 08:46

Bogdan Constantinescu

5,296
4
39
50

Bogdan, I have editted my current points in my original post. Could you help to review and comment please? I learned them after reading your recommended links. – George2 Mar 26 '09 at 03:13
@George2 - Your edit is very true. That is the whole idea. :) The best thing you can get on a server is probably a multi-core multi-CPU – Bogdan Constantinescu Mar 26 '09 at 14:28
2

Both links are broken :( – rkachach Oct 30 '16 at 11:36

amit kumar · Answer 3 · 2009-03-26T16:36:35.230

2

In a nutshell: multi-CPU or multi-processor system has several processors. A multi-core system is a multi-processor system with several processors on the same die. In hyperthreading, multiple threads can run on the same processor (that is the context-switch time between these multiple threads is very small).

Multi-processors have been there for 30 years now but mostly in labs. Multi-core is the new popular multi-processor. Server processors nowadays implement hyperthreading along with multi-processors.

The wikipedia articles on these topics are quite illustrative.

edited Mar 26 '09 at 16:36

answered Mar 25 '09 at 08:40

amit kumar

20,438
23
90
126

Amit, 1. I have editted my current points in my original post after learning from you. Could you help to review and comment please? 2. What means die and tear in your post? – George2 Mar 26 '09 at 03:14

score 0 · Answer 4 · answered Aug 18 '22 at 15:00

Hyperthreading is a cheaper and slower alternative to having multiple-cores

The Intel Manual Volume 3 System Programming Guide - 325384-056US September 2015 8.7 "INTEL HYPER-THREADING TECHNOLOGY ARCHITECTURE" describes HT briefly. It contains the following diagram:

TODO it is slower by how much percent in average in real applications?

Hyperthreading is possible because modern single CPUs cores already execute multiple instructions at once with the instruction pipeline https://en.wikipedia.org/wiki/Instruction_pipelining

The instruction pipeline is a separation of functions inside of a single core to ensure that each part of the circuit is used at any given time: reading memory, decoding instructions, executing instructions, etc.

Hyperthreading separates functions further by using:

a single backend, which actually runs the instructions with its pipeline.

Dual core has two backends, which explains the greater cost and performance.
two front-ends, which take two streams of instructions and order them in a way to maximize pipelining usage of the single backend by avoiding hazards.

Dual core would also have 2 front-ends, one for each backend.

There are edge cases where instruction reordering produces no benefit, making hyperthreading useless. But it produces a significant improvement in average.

Two hyperthreads in a single core share further cache levels (TODO how many? L1?) than two different cores, which share only L3, see:

The interface that each hyperthread exposes to the operating system is similar to that of an actual core, and both can be controlled separately. Thus cat /proc/cpuinfo shows me 4 processors, even though I only have 2 cores with 2 hyperthreads each.

Operating systems can however take advantage of knowing which hyperthreads are on the same core to run multiple threads of a given program on a single core, which might improve cache usage.

This LinusTechTips video contains a light-hearted non-technical explanation: https://www.youtube.com/watch?v=wnS50lJicXc

Multi-CPU is a bit like multicore, but communication can only happen through RAM, not L3 cache

This means that if possible, you want to partition tasks that use the same memory a lot for each separate CPU.

E.g. the following SBI-7228R-T2X blade server contains 4 CPUs, 2 on each node:

Source.

We can see that there seem to be 4 sockets for the CPUs, each covered by a heat sink, with one open.

I think across the nodes, they don't even share RAM memory and must communicate through some kind of networking, thus representing one further step up on the hyperthread/multicore/multi-CPU hierarchy, TODO confirm:

What are the differences between multi-CPU, multi-core and hyper-thread?

4 Answers4

Linked