3

.NET thread pool uses threads that map to logical cores. In turn, on machines with HT, this means scheduling to both HTs for each processor core.

Is there a thread pool of threads that were created with affinity to physical cores instead of HTs?

This is to avoid additional contention in floating-point heavy applications run on CPU. A developer could simply use a custom scheduler and get tasks running, one per FPU/AVX unit.

GregC
  • 7,737
  • 2
  • 53
  • 67
  • 1
    If this is really an issue, why not just disable HT in the bios? – Neil Nov 08 '18 at 15:14
  • 1
    Because workloads change. Sometimes, it's integer-intensive, and HT brings nearly double the speed. (Yet another comment could be to use the GPUs.) – GregC Nov 08 '18 at 15:22
  • 1
    My next question was going to be: how much does it improve your app if you disable HT. Is 2x theoretical, or have you measured that directly with your code? – Neil Nov 08 '18 at 15:27
  • 1
    I'm fighting for that last 10% here. It's not an amazing amount of improvement, but it does help a bit. – GregC Nov 08 '18 at 15:29
  • This is when compared to simply scheduling same number of tasks as there are CPU cores on default thread pool – GregC Nov 08 '18 at 15:31
  • The thing is, even if you did manage to disable HT for YOUR process, other processes would be still using HT and using those cores you have decided not to. – Neil Nov 08 '18 at 15:32
  • 2
    The other thing is that if your app is that FP-heavy, don't do it in C#. You should be using C or C++ and the Intel compiler to get much better performance. https://bitsum.com/tips-and-tweaks/why-you-should-not-disable-hyper-threading-or-why-you-should/ – Neil Nov 08 '18 at 15:35
  • I'm calling IPP library from my app for signal processing – GregC Nov 08 '18 at 15:56
  • And this is exactly the threading behavior I'm looking for. Everything else runs as it does, but scheduler gets a hint that these signal-processing tasks should run on the next available FPU. Do not pass go, do not try to schedule to an HT whose twin is executing more signal processing just now. – GregC Nov 08 '18 at 16:02
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/183311/discussion-between-gregc-and-neil). – GregC Nov 08 '18 at 16:05
  • So you might be aware of it already but HT might help FPU performance because (assuming INTEL) multiplication has a latency of 5 cycles but a throughput of 1 cycle (different depending on model). What this means is that if you order your instructions correctly you get 1 multiplication per cycle (or even 8 with AVX) but if you do it wrong you just get 1/5 multiplication per cycle. Processor Out of Order execution tries to help you but can't do miracles. HT can help here. – Just another metaprogrammer Nov 08 '18 at 16:48
  • With that said if you need custom affinity I would look for a ThreadPool library that supports modifying affinity of the threads in ThreadPool. – Just another metaprogrammer Nov 08 '18 at 16:50
  • Micro-optimizations are internal to IPP library. This is a macro scheduling question, please – GregC Nov 08 '18 at 17:35
  • https://stackoverflow.com/a/1281697/90475 – GregC Nov 10 '18 at 01:53

0 Answers0