0

I'm on the 12th gen Intel CPUs with hyperthreading enabled. From my lscpu output, I can see that logical cores 0 and 1 both map to physical core 0. I understand that if I allocate processes P0 and P1 to 0 and 1, respectively, they'll essentially multiplex physical core 0's resources via SMT. If I allocate both P0 and P1 to logical core 0, however, how is this execution handled? Does the OS just continually context switch them?

I'm just seeing some interesting behavior where P0 and P1 run faster when they're allocated on the same logical core (note that P0 and P1 communicate), faster than if they're multithreaded on the same physical core, or allocated to two different physical cores altogether. I can't explain this.

  • What does “allocate to a core” mean here? Is it the same as pinning a process to a core, ie telling the OS that the process may be scheduled to run only on that core and nowhere else? – Jeremy Friesner Jul 16 '22 at 21:47
  • Yup, just context switching the normal way, exactly like on a uniprocessor machine. Re: worse performance: are they accessing the same cache line continuously? If so, serializing their execution in 10ms chunks could be better than constant stalls that make them run much less than half speed. [What are the latency and throughput costs of producer-consumer sharing of a memory location between hyper-siblings versus non-hyper siblings?](https://stackoverflow.com/q/45602699) / [Should the cache padding size of x86-64 be 128 bytes?](https://stackoverflow.com/q/72126606) – Peter Cordes Jul 16 '22 at 21:56
  • Semi related perhaps, depending what you're doing: [Why does false sharing still affect non atomics, but much less than atomics?](https://stackoverflow.com/q/61672049) – Peter Cordes Jul 16 '22 at 22:02
  • @JeremyFriesner exactly I'm pinning via setaffinity() – cool_cat Jul 16 '22 at 22:12
  • Thanks for your answer @PeterCordes, I'll read the answers you've linked – cool_cat Jul 16 '22 at 22:13

1 Answers1

1

If I allocate both P0 and P1 to logical core 0, however, how is this execution handled?

It depends on the scheduler and what the scheduler is told about P1 and P2. If one process has a higher priority or an earlier deadline, then it may be given 100% of CPU time (until it terminates or unless it blocks temporarily). Regularly switching between the processes is a generic option that's often used when the scheduler can't distinguish between the processes (e.g. they're equal priority).

Note that this regular switching is mostly bad - e.g. if both processes would take 1 hour of CPU time to complete, then after 1 hour you end up no completed processes (2 "not quite half-completed due to extra overhead of context switches" processes) instead of 1 completed process (and 1 process that's made no progress); and after 2 hours you end up with no completed processes (2 "almost completed due to extra overhead of context switches" processes) instead of 2 completed process.

I'm just seeing some interesting behavior where P0 and P1 run faster when they're allocated on the same logical core (note that P0 and P1 communicate), faster than if they're multithreaded on the same physical core, or allocated to two different physical cores altogether. I can't explain this.

I can't explain it either - it depends on how the processes communicate (in addition to depending on the scheduler and what the scheduler is told).

Brendan
  • 35,656
  • 2
  • 39
  • 66