gcc compilations (sometimes) result in cpu underload

Question

I have a larger C++ program which starts out by reading thousands of small text files into memory and storing data in stl containers. This takes about a minute. Periodically, a compilation will exhibit behavior where that initial part of the program will run at about 22-23% CPU load. Once that step is over, it goes back to ~100% CPU. It is more likely to happen with O2 flag turned on but not consistently. It happens even less often with the -p flag which makes it almost impossible to profile. I did capture it once but the gprof output wasn't helpful - everything runs with the same relative speed just at low cpu usage.

I am quite certain that this has nothing to do with multiple cores. I do have a quad-core cpu, and most of the code is multi-threaded, but I tested this issue running a single thread. Also, when I run the problematic step in multiple threads, each thread only runs at ~20% CPU.

I apologize ahead of time for the vagueness of the question but I have run out of ideas as to how to troubleshoot it further, so any hints might be helpful.

UPDATE: Just to make sure it's clear, the problematic part of the code does sometimes (~30-40% of the compilations) run at 100% CPU, so it's hard to buy the (otherwise reasonable) argument that I/O is the bottleneck

It doesn't need the full CPU to read in the files, since it is IO-bound in the first part? — Daniel Fischer, Dec 12 '12 at 22:53
Maybe sometimes the filing system is busy doing something else and so it slows down your program. Make sure all the files are local and that the OS does not start to its housekeeping. — QuentinUK, Dec 12 '12 at 22:58
Just means your CPU can processes the files faster than the disk sub-system can read the disk. — Martin York, Dec 12 '12 at 23:07
Without profiling all you can do is guess. A good guess is the IO. You probably shouldn't disqualify that possibility without really profiling first. See http://stackoverflow.com/q/26663/716443 for some tool suggestions. — DavidO, Dec 12 '12 at 23:29

DigitalRoss · Accepted Answer · 2012-12-13T00:29:13.923

4

It's the buffer cache

My guess is that you are seeing the results of the Linux buffer cache in operation.

Those thousands of files will take a long time to read in from the disk and the CPU will mostly be waiting on rotational and seek latencies. Reported CPU-time used will be low expressed as a percentage. (But probably greater overall.)

But once read, those small files are completely cached in memory and accessing each file (in subsequent runs) becomes a purely CPU-bound activity.

Whether the blocks remain in the cache depends on intervening activity, such as recompiles. When new programs are run and other files are read, the programs and the files will be cached and old blocks will be dropped, and obviously, memory-intensive workload will also clear out the buffer cache.

edited Dec 13 '12 at 00:29

answered Dec 12 '12 at 23:01

DigitalRoss

143,651
25
248
329

yeah you guys are totally right. I tested it by clearing the cache between runs of the same compilation. Stackoverflow rocks! – confusedCoder Dec 12 '12 at 23:59
@confusedCoder - I'm curious, how did you clear the cache between runs? – phonetagger Dec 13 '12 at 00:27
@phonetagger `sync; echo 3 > /proc/sys/vm/drop_caches` as described here http://go2linux.garron.me/linux/2011/01/how-clear-or-drop-cache-buffer-pages-linux-memory-880 – confusedCoder Dec 13 '12 at 00:45

phonetagger · Answer 2 · 2012-12-12T23:05:30.533

3

Since you're reading a ton of small files, your program is blocked waiting on disk I/O for the majority of the time. Since the CPU isn't busy while it's waiting for the disk to ship the data to it, you're seeing a load of significantly less than 100%. Once that's over, now you're CPU-bound, and your program will eat all available CPU time.

The fact that it works faster sometimes is because (as Jarryd & DigitalRoss mention) once you've read them into system memory, they're in the OS's cache, so subsequent loads will be an order of magnitude faster, unless they've been evicted by other disk I/O. So if you run the program back-to-back, the 2nd run will probably be much faster. If you wait a while (and do other stuff in the meantime), there may have been enough other disk I/O to evict those files from the cache, in which case it will take a long time to read them again.

edited Dec 12 '12 at 23:05

answered Dec 12 '12 at 22:58

phonetagger

7,701
3
31
55

1

To test this hypothesis, put the files on a flash drive and see if the problem gets better. If it does, time to shop for a SSD. – Mark Ransom Dec 12 '12 at 23:00
Then on other runs it won't be delayed because those files are now in memory and don't have to be read off the disk again. – Jarryd Dec 12 '12 at 23:02
1

@MarkRansom - Uh, maybe. If it's a fast flash drive & it's on a fast bus. I have some old 1G USB1.0 thumb drives that are a whole lot slower than even a slow hard drive. – phonetagger Dec 12 '12 at 23:02
Please see update. I don't see how it can be I/O bottle-neck because sometimes is does run fine at 100% – confusedCoder Dec 12 '12 at 23:04
1

It runs at 100% if the blocks are cached from a previous run. – DigitalRoss Dec 12 '12 at 23:06
1

Read my previous comment. The I/O bottleneck won't occur sometimes once the data has been read in because the OS keeps a cache of hard drive data. – Jarryd Dec 12 '12 at 23:07

score 3 · Answer 3 · answered Dec 13 '12 at 06:24

In addition to other answers mentionning the buffer cache, if you want to understand what is going on during a compilation, you could pass some of the below flags to GCC (i.e. to g++, probably as a CXXFLAGS setting in your Makefile):

-v to ask g++ to show the involved subprocesses (e.g. cc1plus for the proper C++ compiler)
-time to ask g++ to report the time of each sub-process
-ftime-report to ask g++ (actually cc1plus) to report the time of internal phases or passes inside the compiler.

gcc compilations (sometimes) result in cpu underload

3 Answers3

It's the buffer cache

Linked