1

In a Heterogeneous MultiProcessing model, the different cores of a CPU or SoC don't have the same performance profiles. While HMP systems first got deployed a while ago (the wiki mentions Samsung started using this model back in 2013), they are coming to a head with apple's M-series (technically it was already an issue with HyperThreading, which could be considered a form of HMP I guess).

Parallelized tools generally try to guess the number of workers they should create by counting the number of cores on the system, however in an HMP model that can be counter-productive, especially for personal devices: while the "efficient" cores are technically available they have low performances, loading them will not be a huge performance gain, and it can drastically impact the interactivity / pleasantness of interacting with the system. This is usually configurable (so the user can set something more reasonable), but it seems a better default to segregate such tools to "high-performance" cores only (at least assuming they are CPU bound), leaving users the option to increase residency if they so choose.

And so is there a portable way to list and segregate "real" and "high-performance" cores from "virtual" and "efficient" cores?

Note: I've seen 68444429 "how can I distinguish between high- and low-performance cores in C++", while the title is similar the question and goals are rather different as the goal here is to avoid generating unnecessary and inefficient work by default.

Masklinn
  • 34,759
  • 3
  • 38
  • 57
  • Such architecture are currently very badly supported by portable softwares. Without a support of the OS, and hardware vendors, software applications will likely use a bad number of workers by default and this require developers to support manually the target hardware. This is a huge problem as writing parallel application is already complex and such processor are expected to be more mainstream in the future (due to the energy consumption). This is also already a current limitation for hybrid architectures. – Jérôme Richard Mar 02 '22 at 17:53
  • Intel choose to adapt the number of hardware threads per core on its hybrid architecture so that application do not need much to care about that. The high-performance cores are designed to be about 2 time faster if application are optimized. Because there is 2 hardware-thread on such core, the work balancing is quite good in practice. Moreover, cores are not disabled by default. The OS scheduling layer is responsible to schedule the work correctly. To write portable applications, you need not to care about this and use runtimes that supports this (currently many does not well). – Jérôme Richard Mar 02 '22 at 17:59
  • @JérômeRichard What you describe from Intel doesn't seem any more useful than the usual: not only do you *still* not want to run a worker per high-efficiency cores, as noted you may not even want to run a worker per hyperthread, but rather just one worker per physical high-performance core. As to not caring about this and letting the runtime handle it, it's absolute garbage unless you're using Apple's Grand Central Dispatch, which locks you into macOS. – Masklinn Mar 02 '22 at 19:18
  • I don't want the scheduler to make things up and drive itself into a ditch, I want the ability to spawn a sensible number of worker threads / processes to parallelise the work over. – Masklinn Mar 02 '22 at 19:19
  • My point is that it is not to the application to do that but runtimes/tools and they should request this from the OS. As said in the linked answer: "Any attempt to break the "thread" abstraction (e.g. determine if the core is a low-performance or high-performance core) is misguided". The bad news is that such processors are very badly supported yet: AFAIK neither Linux or Windows provide a standard interface to get the infos (like for NUMA). Most runtimes does not support that and tools barely does. Thus currently, there is no portable way to do that nor great solutions (but IDK on MacOS). – Jérôme Richard Mar 02 '22 at 20:55

0 Answers0