9

I have this simple hello world program:

#include <stdio.h>

int main() {
  printf("Hello, world!\n");
  return 0;
}

I compiled this program with LLVM Clang (v15.0.1, built from Homebrew, so not Apple's version) like normal, then I ran and timed the outputs. To my surprise, the first time the program ran, it took nearly 10x longer than the second time, but the next three executions run much faster.

$ clang test.c -o test
$ time ./test
Hello, world!

real    0m0.169s
user    0m0.001s
sys 0m0.002s
$ time ./test
Hello, world!

real    0m0.017s
user    0m0.001s
sys 0m0.006s
$ time ./test
Hello, world!

real    0m0.004s
user    0m0.001s
sys 0m0.002s
$ time ./test
Hello, world!

real    0m0.008s
user    0m0.001s
sys 0m0.005s

I'm running this on an Intel Core i5 mac, running macOS Big Sur v11.6.8. The shell is the bash shipped with macOS.

Nothing in my code deals with time, and I don't think there's anything to cache, so I'm not sure why the first execution runs so slow. I suspect that the OS might be doing some kind of optimization, but I don't know what/how. What is the cause of this large discrepancy in runtimes?

Michael M.
  • 10,486
  • 9
  • 18
  • 34
  • 6
    CPUs warms up. Caches too. You don't want a nuclear plant running on max when noone is using it. Same thing with modern CPUs. They have a low speed until you challenge them and then they speed up (if that's how you've set it). – Ted Lyngmo Jan 28 '23 at 01:38
  • 1
    @TedLyngmo I thought that might be the case, but even after 10 minutes the program still runs faster than the first time after compiling. It runs slower than 10ms, but still not nearly 169ms. – Michael M. Jan 28 '23 at 01:40
  • That's not a contradiction. Quite the opposite. The first time the program warmed up the system. Now you're running on a hot system. – Ted Lyngmo Jan 28 '23 at 01:41
  • 3
    Try `echo 3 | sudo tee /proc/sys/vm/drop_caches` before next run. – dimich Jan 28 '23 at 01:48
  • 2
    I've seen the same effect on Linux. Perhaps the timing code itself had to get paged in from disk (it's a bash/zsh built-in, but demand-paging of executables works in 4k chunks). And/or the dynamic library code that runs as part of your binary. Try timing something *else* before the first timing of the newly-compiled binary, e.g. `time /bin/true`. – Peter Cordes Jan 28 '23 at 02:04
  • 6
    CPU-frequency warmup doesn't explain it, @TedLyngmo. The half second between up-arrow and return to run it again is plenty of time for that CPU core to go back into a deep sleep state (and power down its caches). And the binary file containing the executable was just written so it's hot in DRAM (in the pagecache, ready to get mapped into memory). The linker won't have used direct IO to bypass the kernel's buffering/caching. But yes, DRAM cache of other files like the dynamic linker makes sense, not the executable itself. Except clang just ran, so not the dynamic linker or libs. – Peter Cordes Jan 28 '23 at 02:08
  • 1
    @PeterCordes _"CPU-frequency warmup doesn't explain it"_ - I'm 100% sure you're right. The speed and heat correlates (for me) well with what I've seen when I've done similar things and my conclusion was based on that observation only. I have not searched for other factors (no, need, my gf reports when the fans are starting to be a nuisance). – Ted Lyngmo Jan 28 '23 at 02:17
  • 1
    It’s the dynamic linker loading everything in the first run, and running from cache after that. – Richard Barber Jan 28 '23 at 02:20
  • @TedLyngmo: CPU frequency decisions are made every 1 to 10 milliseconds, or less with hardware controllers. Running clang used way more CPU time than `time ./test` does, so if "recent" load more than a second ago was relevant, the first run should be the fastest. (Unless it takes so much longer to type `time ./test` than to up-arrow it...) With a software governor, a process using up its timeslice will typically trigger a jump to max frequency, but that'll age away very soon. – Peter Cordes Jan 28 '23 at 02:29
  • maybe it's because of caching? – justANewb stands with Ukraine Jan 28 '23 at 02:36
  • 8
    I always assumed it was just the MS antivirus checking out the executable the first time it was executed... – Dúthomhas Jan 28 '23 at 02:40
  • @PeterCordes I'm not sure my takeaway from this is correct but _"the first run should be the fastest"_ is quite the opposite from what I've seen on my very old rig (a ~15 year old Xeon). – Ted Lyngmo Jan 28 '23 at 02:45
  • @TedLyngmo: Exactly, that's another way we can rule out the CPU-frequency warmup hypothesis, because it predicts the opposite of what happens. I also usually observe the same thing, that the first run of a newly-linked executable is slower, on Linux on an i7-6700k Skylake, running from tmpfs. If you put multiple runs truly back-to-back, like `time ./test; time ./test` on the same command line, or a shell loop, you could have an effect, but other effects are clearly more significant. – Peter Cordes Jan 28 '23 at 02:54
  • @PeterCordes Oh.... I've monitored the CPUs cores and run tests in random order when I want to compare. I also fixed the CPUs cores at max speed (at one point) to rule spin-up/down out. This is when I've bombarded all cores with tasks and the logs of CPU speed and how quickly things are getting done have always seemed to correlate. (Sidenote: I got "CPU throttling" and then core after core winding down - before Dell released a microcode update). – Ted Lyngmo Jan 28 '23 at 03:05
  • 1
    @RichardBarber Do you have any references in regard to caching of dynamically linked objects? I assume anything could be preloaded and the symbols would have to be relinked. – Moss Richardson Jan 28 '23 at 03:11
  • @TedLyngmo: Yeah, under sustained load, the CPU frequency jumps up and then things run faster. But even 10 ms of no load is enough to drop back to idle, so *this* performance difference of the first run of a fresh executable on a hopefully-idle system isn't explainable by CPU frequency warm-up. Given the OP's report of a persistent effect even after 10 minutes, it could be some kind of caching, or as Dúthomhas suggested, a virus scan or other extra security checks the first time a newly-written file is executed. (Unlikely to be MS, though, on MacOS!). – Peter Cordes Jan 28 '23 at 03:43
  • @TedLyngmo: On Linux with software CPU-frequency management, [Why does this delay-loop start to run faster after several iterations with no sleep?](https://stackoverflow.com/q/38299023) shows that a process not using up a full scheduler timeslice doesn't keep the CPU frequency up. IDK if MacOS would use a similar algorithm or not. – Peter Cordes Jan 28 '23 at 03:48
  • @PeterCordes I can fit that into what I saw back when I tuned it. My server is now really slow to react. Give it a second and then, "oh, work? fun" :-) I've left my desktop mostly "as delivered" and haven't had to do anything but to swap using the cabinet sensor as trigger for the fans. When the GPU starts, the CMOS sensor is closer to the action and then can trigger the fan to get the air pumping out a lot faster. Seems to keep the system a little happier. – Ted Lyngmo Jan 28 '23 at 03:54
  • Looks like [this is a dup](https://stackoverflow.com/questions/13365914/why-the-first-time-c-program-runs-it-runs-10x-slower). Consensus seems to be: disk caching. (And maybe AV.) – Dúthomhas Jan 28 '23 at 19:05
  • 1
    @Dúthomhas: That question talks about sorting "a file of integers", which wouldn't get read during compilation so could be cold, unlike this which doesn't intentionally read other files. And talking about how it gets slow again after restarting the PC, so again clearly disk caching. That doesn't explain this; the binary itself is freshly written; still hot in the pagecache. (Maybe even still dirty; I don't think an OS would need to sync it to disk before mapping it into a process, but perhaps it does so anyway? Although I think I've seen this on Linux in tmpfs so that wouldn't explain it.) – Peter Cordes Jan 28 '23 at 22:27
  • @Dúthomhas: So yes that's related, but this question is more specific; if it is disk caching, a good answer should identify exactly what file is cold, and show that warming it up via some method other than running this program (e.g. running another program?) makes the first run fast. – Peter Cordes Jan 28 '23 at 22:28
  • @MichaelM. Could you try `sync && sudo purge` to eliminate disk caching? – Moss Richardson Jan 29 '23 at 01:26
  • 1
    @MossRichardson The code runs in `0.087` seconds, much slower than runs after it, but about twice as fast as the post-compile run. It looks like disk caching is part of it, but I think there's a bit more to it. – Michael M. Jan 29 '23 at 01:34
  • Some good suggestions here. In my limited understanding of CPU/cache architecture, I'd have just assumed it was a result of caching. Probably not L1, but more likely L2 or L3 cache. – Fiddy Bux Feb 03 '23 at 19:30
  • 10
    None of these comments are remotely relevant. You're clearly running OSX, and the first execution of a binary runs up against the 'System Integrity Protection' (SIP) infrastructure. Hot caching may explain subsequent speedups to a point. It's possible to disable SIP. Don't. – Brett Hale Feb 04 '23 at 17:18
  • 1
    I took a [hello world](https://github.com/below/HelloSilicon/blob/main/Chapter%2001/HelloWorld.s) written in AArch64 assembly, compiled it with clang on macOS, ran it and left it alone for a few days. Now when I execute it with `time ./helloworld`, the first run is takes `real 0m0.051s` and subsequent runs `real 0m0.006s` to `real 0m0.009s`. I'm on this question because I've noticed this too for a while. – Boris Verkhovskiy Mar 16 '23 at 08:06
  • I would suggest trying to run the program from a in memory filesystem rather than disk to see if it is because of disk caching. – anton-tchekov Jul 09 '23 at 21:18

1 Answers1

-2

The behavior you are observing is likely due to the way modern operating systems handle executable files' first-time execution. It's called "file system caching" or "disk caching." When you execute a program for the first time, the operating system loads the program from disk into memory, and subsequent executions are much faster because the OS keeps a cached copy of the program in memory.

Here's what typically happens during the first and subsequent runs:

First Execution:

The operating system loads the executable "test" from disk into memory. The program is executed, and the output is displayed. Since it's the first time running the program, the OS doesn't have a cached copy, and it takes some time to read from the disk and load the program into memory, which leads to a longer execution time. Subsequent Executions:

The operating system already has a cached copy of the "test" program in memory from the first execution. The cached copy is used directly, eliminating the need to read from the disk, resulting in much faster execution times. As you've observed, subsequent executions are significantly faster than the first one. The time difference you see between the first and second execution is likely due to the disk cache and memory cache behavior of the operating system. On subsequent runs, the program's binary and library dependencies may still reside in memory, which makes the execution much faster.

Additionally, the timing differences you see can also be affected by other background processes running on your system during different executions.

This behavior is not specific to macOS; it's a common feature of modern operating systems to improve the overall performance of frequently executed programs.

If you want to get a more consistent benchmark of your program's execution time, you can run it multiple times and take the average time from those runs after the initial cache warm-up.

nudoiba
  • 22
  • 2
  • 4
    On the first run after a compile, the file was just *written* by the compiler so it's already hot in RAM from that. This was [discussed in comments](https://stackoverflow.com/questions/75264808/why-do-programs-run-slower-on-first-run-after-compile#comment132812475_75264808) when the question was first posted. Guesses included the timing code or libraries needing to get paged in, but a C compiler probably warmed up the pages containing the libc functions this simple C program uses. – Peter Cordes Aug 04 '23 at 21:49