CPU count for Xeon processor confusion

Question

My iMac Pro has an Intel 3GHz Xeon 10-core W-2150B processor.

My understanding is that this is one CPU albeit with 10 cores.

Now consider this trivial Python code:-

import os
print(f'Number of CPUs={os.cpu_count()}')

This will emit:- Number of CPUs=20

...whereas I would expect it to tell that there's 1 CPU or, arguably, 10.

What's going on here?

score 2 · Answer 1 · answered Aug 15 '21 at 08:39

2

Python simply reports the number it gets from the OS. The OS simply reports the number it gets from the firmware. The firmware simply reports the number it gets from the CPU.

So, the short answer to your question is: Python is reporting 20 CPUs because that is what Intel has decided to report.

Now, the question is of course: does it make sense to report 20 CPUs? And the answer to that is not simple. At the time when the system calls that report the number of CPUs were designed, the relationship between sockets, chips, dies, CPUs, execution cores, and threads was a very simple 1:1:1:1:1:1 relationship. So, nobody ever thought about whether the number should mean the number of sockets, the number of CPUs, the number of cores, the number of threads, or whatever, because it didn't matter. It was all the same.

But nowadays, the relationship is much more complex:

a socket can hold a package that contains multiple chips
each chip can contain multiple dies
each die can contain multiple CPUs
each CPU can have multiple execution cores
each core can potentially execute multiple threads (mostly) in parallel

So, you have to think about what it actually is that you are interested in. And Intel has decided that what you are most likely interested in, is the number of hardware threads that can be executed in parallel. And since on the particular CPU that you have, there are 10 execution cores in the CPU and each core can execute 2 threads, the CPU reports itself as 20 CPUs, even though it is only 1 CPU.

Because as a programmer, you most likely don't care how many blobs of sand you have in your computer, but what you can do with them.

Although in reality, the question is even more complex because the two threads per execution core are not totally independent, and depending on your specific workload, you might actually only want to use one thread per CPU, so you might actually need to know the difference between execution cores and threads. Additionally, some recent CPUs have different types of cores, e.g. the Apple M1 CPU has 8 cores with identical Instruction Sets, but 4 of them are optimized for resources (and thus somewhat slower) and 4 of them are optimized for performance (and thus consume more power and produce more waste heat).

The current SPARC CPUs from Oracle can schedule up to 8 threads per execution core, of which they can execute up to 2 simultaneously. So, should this CPU report itself as 2 CPUs or 8? And so on, there are dozens of such examples that show that the answer to the question "how many CPUs do I have" is not so simple, and depends heavily on what, precisely, you actually mean by "CPU".

If you want to write high-performance and/or energy-efficient code, a simple number is not enough. You need to know the full hierarchy and dependencies between the different elements.

answered Aug 15 '21 at 08:39

Jörg W Mittag

363,080
75
446
653

Thank you for this well-researched response. This raises an interesting dilemma when multi-threading in Python. For example, the default number of 'workers' for a ThreadPoolExecutor is documented as being os.cpu_count() * 5. The implications of this are obvious – Aug 15 '21 at 08:51
*each die can contain multiple CPUs* - Did you have an example in mind for that? Has any vendor ever put multiple fully separate CPUs on one die, each with their own memory controller or front-side bus, and talking to each other only through some interconnect that goes off chip (e.g. to a backplane in the package)? If they were more tightly coupled (like the CCXs in a Zen), we normally consider that parts of a single large CPU, even if like Zen each group of cores has separate L3 cache. – Peter Cordes Aug 15 '21 at 14:58
Also, "chip" and "die" are pretty much synonymous, I thought, each meaning a separate physical piece of silicon. There have indeed been CPUs with multiple dies in one package, like Core 2 Quad (basically two Core2Duo dies glued together with some logic, resulting in quite slow cache<->cache transfers between pairs of cores not sharing an L3) and numerous other examples. And now with chiplets we're going to see lots of CPUs with multiple dies, with a core or group of cores being a separate die. – Peter Cordes Aug 15 '21 at 15:03
@PeterCordes: There are so many weird processors, I am just covering my bases. From the POWER "processor books" to Azul's Vega-3 with 16 sockets, each with one CPU consisting of 6 clusters of 9 cores (or was it 9 clusters of 6 cores?) to Xeon Phi to FLEET to probably hundreds of crazy designs I have never even heard of. – Jörg W Mittag Aug 15 '21 at 19:38
Ok, but if all the cores on a die are part of the same SMP system, I think it's pretty unlikely that there's ever been a commercial design where is would make sense to divide the set of cores on a die into separate groups that you call separate "CPUs". Maybe there's something I'm not considering that would make that make sense somehow, or some criterion for deciding what's a separate CPU or not other than having fully separate external pins. – Peter Cordes Aug 15 '21 at 19:45
And yes there are lots of ways to connect cores together, but fortunately it basically comes down to a couple possible models: cache-coherent SMP where we run a single system-image (i.e. looks like one computer that you boot Linux on, for example) and can run multi-threaded programs across those cores; or non-coherent (with shared memory as an interconnect for message passing if there's shared memory at all), and you run separate instances of an OS on separate groups of cores that do have coherent caches. Or the hypothetical case of running threads across a system that need manual flushing. – Peter Cordes Aug 15 '21 at 19:49
1

@PeterCordes: I seem to remember a design with multiple "things" on one die, each having multiple cores and one memory controller with local RAM, i.e. a three-level NUMA architecture, where the "distance" to the RAM depended on whether the cores were part of the same "thing", different "things" on the same die, or on separate dies. I think the history behind it was that, as process sizes shrank and yield increased, instead of designing a next generation, they just copy&pasted their design into two identical copies on one die. I call those "things" CPUs, but that's by no means definitive. – Jörg W Mittag Aug 15 '21 at 19:50
Ah ok, I could buy that; a set of cores sharing a set of memory controllers is a NUMA node, and you could have multiple on one die instead of doing what Intel does and have all cores on a die share all memory controllers through a more complex interconnect.. "CPU" is a reasonable choice of name for it, and yeah that's a plausible way you could end up with such a design if your tradeoff between design time vs. number of chips produced is different from say Intel (where it would definitely be worth taking more time to better integrate all cores into one CPU). Thanks for pointing that out. – Peter Cordes Aug 15 '21 at 19:55

slow-but-steady · Answer 2 · 2021-08-15T08:09:36.697

EDIT: check out this thread: https://stackoverflow.com/a/1715612/9983575 - there's multiple different ways discussed in the different answers, and a simple one is to run sysctl -n hw.ncpu from the command line (note this is the command line, and not the python interpreter). This should match the output you see from

import os
os.cpu_count()

Specifically, some more details from the comments section:

how many physical cores does the machine have and what chip is it? If it's a core i7 with 2 physical cores for example, it will show as 4 because the chip supports hyper-threading and presents itself to the OS as if it has 4 addressable cores. – jkp Sep 26 '11 at 8:51

ORIGINAL ANSWER:

Just tried this and I also got twice the number of CPUs as expected, and looking at other documentation and methods didn't help much.

Things I checked:

os.sched_getaffinity(0) vs os.cpu_count()
How to find out the number of CPUs using python
the help documentation (from help(os.cpu_count)):

Help on built-in function cpu_count in module posix:

cpu_count()
    Return the number of CPUs in the system; return None if indeterminable.
    
    This number is not equivalent to the number of CPUs the current process can
    use.  The number of usable CPUs can be obtained with
    ``len(os.sched_getaffinity(0))``

https://docs.python.org/3/library/os.html#os.sched_getaffinity

It also looks like os.sched_getaffinity(0) doesn't seem to work for me locally and I can't figure out why, since the method is included in both help(os.cpu_count) and https://docs.python.org/3/library/os.html#os.sched_getaffinity.

My guess is that each core on a Mac has 2 CPUs - this would explain the 20 result for you on a 10 core machine, and the 8 result for me on a quad core machine. I haven't been able to find any specifics about Mac cores/CPUs yet, but will update my answer here if I do. See the EDIT and links - my understanding is that hyperthreading causes "1 physical core" to appear as "2 cores" to the operating system

Unfortunately, os.sched_getaffinity() doesn't seem to be available in 3.9.6 — , Aug 15 '21 at 08:09
Ahh that might explain it! I also found a thread that covers this in much more depth than I did in my updated edit, so check that out and see if that covers your question :) — slow-but-steady, Aug 15 '21 at 08:11

CPU count for Xeon processor confusion

2 Answers2