23

Say we want to compile a large project (say GCC or the Linux kernel) as fast as possible. Does a CPU with hyperthreading capability (say an Intel Core i7) run the compiler any faster with hyperthreading enabled or disabled? Are there any published benchmarks that test this?

My understanding of hyperthreading is that each core can select instructions from two (or more processes). This usually makes the core more efficient since it's less likely that functional units will be idle. However, there's potential for a performance penalty since processes running on the same core share resources such as cache and may interfere with one another. Whether or not performance actually increases depends on the workload.

So for a compiler workload, does performance increase? If so, by how much?

Jay Conrod
  • 28,943
  • 19
  • 98
  • 110
  • I have no recent experience with this, but doesn't compilation tend to be I/O-bound? – Ken Jan 06 '10 at 18:12
  • Play with "make -j N" and measure system resources for different N? – Nikolai Fetissov Jan 06 '10 at 18:17
  • @Nikolai, I would if I had a hyperthreaded CPU to play with. I'm asking this so I know whether purchasing one is worthwhile. – Jay Conrod Jan 06 '10 at 18:47
  • 6
    @Ken, my experience is the opposite. Since I'm generally making small changes and recompiling frequently, all the sources are generally in the disk cache. I regularly see 100% CPU usage when compiling. – Jay Conrod Jan 06 '10 at 18:49

2 Answers2

27

Compiling coreutils-8.4 on Ubuntu 8.04 x86

Intel Atom 1.6 GHz with HT enabled:

~/coreutils-8.4$ make clean > /dev/null
~/coreutils-8.4$ time make > /dev/null

real    2m33.375s
user    2m22.873s
sys     0m10.541s
~/coreutils-8.4$ make clean > /dev/null
~/coreutils-8.4$ time make -j2 > /dev/null

real    1m54.707s
user    3m26.121s
sys     0m13.821s
~/coreutils-8.4$ make clean > /dev/null
~/coreutils-8.4$ time make > /dev/null

real    2m33.372s
user    2m22.753s
sys     0m10.657s
~/coreutils-8.4$ make clean > /dev/null
~/coreutils-8.4$ time make -j2 > /dev/null

real    1m54.851s
user    3m26.145s
sys     0m13.685s
~/coreutils-8.4$

So Hyper-Threading reduces the run time to 75%, which is equivalent to 33% more processing power. (I ran them twice to ensure that everything is in the memory cache.)

And here is a control experiment to show that make -j2 alone does not improve the speed for compiling coreutils-8.4 on Ubuntu 8.04 x86

Single-core Core 2 Quad 2.5 GHz VM (no HT):

~/coreutils-8.4$ make clean > /dev/null
~/coreutils-8.4$ time make > /dev/null

real    0m44.453s
user    0m38.870s
sys     0m5.500s
~/coreutils-8.4$ make clean > /dev/null
~/coreutils-8.4$ time make -j2 > /dev/null

real    0m45.131s
user    0m40.450s
sys     0m4.580s
~/coreutils-8.4$ make clean > /dev/null
~/coreutils-8.4$ time make > /dev/null

real    0m44.621s
user    0m39.090s
sys     0m5.340s
~/coreutils-8.4$ make clean > /dev/null
~/coreutils-8.4$ time make -j2 > /dev/null

real    0m45.165s
user    0m40.390s
sys     0m4.610s
~/coreutils-8.4$
netvope
  • 7,647
  • 7
  • 32
  • 42
  • This is great. The control experiment shows that this really makes a difference. Thank you. – Jay Conrod Apr 02 '10 at 15:55
  • 2
    I would have liked seeing the measurements repeated on the Atom with HT disabled, assuming that's possible to accomplish. Also, a note on memory use would be nice, as the Atom might start swapping or dropping caches, particularly in the -j2 case. – Eroen Apr 05 '13 at 12:46
  • In-order Atom is worse at exploiting instruction-level parallelism than a Nehalem or Sandybridge-family CPU, or AMD Ryzen. HT might help more on Atom than on a mainstream CPU. Or it might help less, because mainstream CPUs have bigger caches and more execution resources (and higher branch-mispredict penalties, and HT lets the other thread use the CPU while one is recovering from a mis-speculation). So probably HT helps significantly on mainstream CPUs, too, but the ratio may be fairly different. – Peter Cordes Aug 18 '17 at 02:31
  • 1
    I would take this experiment with a grain of salt. Normally a single process building would go like let's say read,process,write or io/cpu/io, Just by making Make use two threads, you would improve this. One can try disabling HT from bios and test I guess. – auselen May 22 '20 at 16:42
  • @auselen this is a highly simplified experiment with lots of limitations but the control experiment specifically addresses your concern – netvope May 29 '20 at 17:51
  • https://www.phoronix.com/review/amd-epyc-9754-smt tested SMT on/off on an AMD Bergamo with 128 Zen4c cores. Compile times with clang and GCC were worse with SMT enabled vs. disabled. (With plenty of RAM, and sources hot in disk cache, and already plenty of parallelism in the no-SMT build, since that's a huge number of physical cores.) – Peter Cordes Jul 27 '23 at 19:52
0

It all depends on if the compiler is written to be multi-threaded or not. If it is, then definitely hyperthreading speeds things up a bit since then the OS can schedule different parts of the compiler's threads onto different cores. I agree with Ken that compilations generally are more I/O bound than processing intensive, so having a speedy hard-drive would be more of a necessity than a speedy processor with 100's of cores.

ajawad987
  • 4,439
  • 2
  • 28
  • 45
  • How about if the compiler is invoked with make -j N (N being the number of logical processors)? I'm concerned that since distinct compiler processes don't share any data, they much actually reduce performance. – Jay Conrod Jan 06 '10 at 18:52
  • 4
    1) Compilation (on Linux anyway) can always be made non-io-bound, provided sufficient physical memory is present. 2) Popular build systems can invoke many compiler processes in parallel, making multi-threaded compilers a non-issue. (less so for linkers, though) – Eroen Apr 05 '13 at 12:44