4

Compiling my project takes ages and I thought I would like to improve the compile time of it. The first thing I am trying to do is to break down the compile time into the individual files.

So that the compiler tells me for example:

boost/variant.hpp: took 100ms in total
myproject/foo.hpp: took 25ms in total
myproject/bar.cpp: took 125ms in total

I could then specifically try to improve the compile time of the files taking up the most time, by introducing forward declaration and/or reordering things so I can omit include files.

Is there something for this task? I am using GCC and ICC (intel c++)


I use Scons as my build system.

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • Just put `time` before `gcc`. – David Schwartz Dec 17 '12 at 17:52
  • @DavidSchwartz can you please explain what to tell `time` so it spies GCC and output times for the individual headers and source files processed by GCC? – Johannes Schaub - litb Dec 17 '12 at 17:54
  • 1
    You can't output times for individual headers. That doesn't even make any sense. (Say a header has a macro but that macro is invoked in a source file, is that time processing the header or the source file?) But you will get individual source file times so long as your GCC invocation only asks to compile a single source file. (Which is what any sane makefile will do.) – David Schwartz Dec 17 '12 at 17:56
  • 1
    @DavidSchwartz the time output is the time processing the header (for the definition of the macro) and for the source (for using it). That makes perfect sense to me :) – Johannes Schaub - litb Dec 17 '12 at 17:57
  • @DavidSchwartz unfortunately most `.cc` files take ages. so it would really be helpful to have per-header information. – Johannes Schaub - litb Dec 17 '12 at 17:59
  • Maybe it would help to analyze the dependencies between the include files in the first place (using the `-H` [option](http://stackoverflow.com/a/42513/21567). Maybe you can get rid of some. It’s been a while for me, but last I looked most C++ build time was actually due to IO. – Christian.K Dec 17 '12 at 18:05
  • @Christian.K it has hundreds of files. I only want to eliminate the top-most performance eater. I don't care about only a few milliseconds – Johannes Schaub - litb Dec 17 '12 at 18:12
  • @JohannesSchaub-litb: You're taking a very bizarre approach to a very common problem. Why not just use `time g++` to measure it by `cpp` file and then sort. That will almost certainly give you the information you actually need. If it's not obvious why a particular slow `cpp` file is slow, then you have too much in that file. – David Schwartz Dec 17 '12 at 18:19
  • Since everyone is piling on, another reason this is a funny metric: if `foo.hpp` and `bar.hpp` are in some way inter-dependent (for example they include stuff in common or they instantiate the same template specialization) then the order in which they're included might affect how the "cost" of compiling the two of them together is split between the two of them. TMP functions are naturally memoized by the compiler :-) – Steve Jessop Dec 17 '12 at 19:32
  • So perhaps you could (for each header file) write a .cpp file that includes it and does nothing else, and time that. Then for each .cpp file, put a `#ifdef` around everything except its includes at the top, and time it with and without its "own" contents. That will give you a rough idea how heavy each header file is and how heavy each source file is excluding its headers, and might be close enough for jazz. – Steve Jessop Dec 17 '12 at 19:35
  • Clang and gcc both have a `-ftime-report` flag, it's not exactly what you ask for, but it may help to start with. – Matthieu M. Dec 17 '12 at 19:39

3 Answers3

2

You have an unusual, quirky definition of the time spent processing header files that doesn't match what anyone else uses. So you can probably make this happen, but you'll have to do it yourself. Probably the best way is to run gcc under strace -tt. You can then see when it opens, reads, and closes each file, allowing you to tell how long it processes them.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • This will give the time spent doing I/O and maybe parsing. It won't include template expansion or optimization passes, which typically account for the bulk of CPU time. – Ben Voigt Dec 17 '12 at 18:07
  • @BenVoigt: If you read his answer to my comment above, you'll see that he wants that counted toward the main CPP file, not the header file. That's why I said he had an unusual definition and specifically suggested this method for him. – David Schwartz Dec 17 '12 at 18:12
  • I'm not arguing with your answer, just clarifying what the drawbacks of this definition are. – Ben Voigt Dec 17 '12 at 18:13
  • @DavidSchwartz ideally, the template counts should be marked separately like "template count: 1st phase: XXX ms, 2nd phase: YYY ms" or something like that. The parse time of template **is** the headers shit and i want that shit to be counted for the header, not the user. – Johannes Schaub - litb Dec 17 '12 at 18:14
2

The important metric is not how long it takes to process (whatever that means) a header file, but how often the header file changes and forces the build system to reinvoke the compiler on all dependent units.

The time the compiler spends parsing useless code is really small compared to all the other steps of the compilation process. Even if you include entire unneeded files, they're likely hot in disk cache. And precompiled headers make this even better.

The goal is to avoid recompiling units due to unrelated changes in header files. That's where techniques such as pimpl and other compile firewalls come in.

And link-time-code-generation aka whole-program-optimization makes matters worse, by undoing compile-time firewalls and reprocessing the entire program anyway.

Anyway, information on how unstable a header file is should be attainable from build logs, commit logs, even last modified date in the filesystem.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • If the time spent to parse headers was that neglible, then precompiled headers would not be needed, would they? But in reality, using PHs often gives very significant compile time improvement. Actually, I got to this question while googling for a way to decide which headers should go to PCH in the first place. – Alex Che May 11 '20 at 20:01
  • @AlexChe: I mentioned precompled headers in my answer, didn't I. They are the reason that the processing time for the header doesn't matter... because it doesn't get repeated, the PCH gets reused. Unless parts of your precompiled header are unstable. Which is why the focus of my answer is that the problem is header instability, and I mention ways to identify unstable headers (which in turn answers your question about which ones do or don't belong in a PCH -- read my last paragraph). – Ben Voigt May 11 '20 at 20:13
  • Sorry if I've misunderstood your answer. So, say I'm not using PCH yet and want to decide what headers are more important to be precompiled. Does header processing time become an important metric then? – Alex Che May 11 '20 at 20:18
  • @AlexChe: No, really not. Your stable header files go in the PCH. The ones that get changed during development should not be in the PCH. Because the cost of processing *your* headers is tiny compared to the cost of processing the platform headers (and the OS headers they include, on down which is hundreds of files). If there were a way to load a PCH (of platform headers), process a couple project headers, and write out a bigger PCH.... but there isn't. The major compilers treat loading PCH and building PCH as mutually exclusive operations. – Ben Voigt May 11 '20 at 21:02
  • Sure, I was speaking only about stable headers (platform, 3rd party libs), not my headers. – Alex Che May 11 '20 at 21:40
  • @Johannes, I wonder if you've ever got an answer to the question that was asked in your original post? You clicked "accept" on this one, but this answer doesn't really tell you how much time was spent on `variant.hpp` vs `foo.hpp` and `bar.cpp` right (as described in the *title* and body of your post, which is what led me here when I searched something like "compile time by file")? Ben Voigt suggested looking at "build logs, commit logs, even last modified date in the filesystem", did those give you what you were looking for? Otherwise why did you click accept? I'm looking for smthng similar. – Nike Dec 12 '22 at 22:17
2

Have you tried instrumenting the build as a whole yet? Like any performance problem, it's likely that what you think is the problem is not actually the problem. Electric Make is a GNU-make-compatible implementation of make that can produce an XML-annotated build log, which can in turn be used for analysis of build performance issues with ElectricInsight. For example, the "Job Time by Type" report in ElectricInsight can tell you broadly what activities consume the most time in your build, and specifically which jobs in the build are longest. That will help you to direct your efforts to the places where they will be most fruitful.

For example:

enter image description here

Disclaimer: I am the chief architect of Electric Make and ElectricInsight.

Eric Melski
  • 16,432
  • 3
  • 38
  • 52