0

I am profiling the compilation time of my code to determine why the compile time is so slow. I am using gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and have added the compiler flag -ftime-report.

What I notice is that the compilation units that are slow to compile spend a majority of time on the phase opt and generate stage. What exactly is this stage? How can I reduce the time taken by this phase.

For reference, this is what the output for one of the compilation units looks like.

Time variable                                   usr           sys          wall               GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)    1579 kB (  0%)
 phase parsing                      :   1.74 ( 20%)   0.71 ( 44%)   2.46 ( 24%)  311927 kB ( 36%)
 phase lang. deferred               :   1.33 ( 15%)   0.34 ( 21%)   1.67 ( 16%)  259524 kB ( 30%)
 phase opt and generate             :   5.68 ( 65%)   0.58 ( 36%)   6.26 ( 60%)  301021 kB ( 34%)
 phase last asm                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)       2 kB (  0%)
 |name lookup                       :   0.44 (  5%)   0.12 (  7%)   0.49 (  5%)   15499 kB (  2%)
 |overload resolution               :   0.76 (  9%)   0.22 ( 13%)   0.92 (  9%)  130607 kB ( 15%)
 garbage collection                 :   0.33 (  4%)   0.01 (  1%)   0.34 (  3%)       0 kB (  0%)
 dump files                         :   0.18 (  2%)   0.04 (  2%)   0.10 (  1%)       0 kB (  0%)
 callgraph construction             :   0.12 (  1%)   0.03 (  2%)   0.14 (  1%)    6318 kB (  1%)
 callgraph optimization             :   0.16 (  2%)   0.04 (  2%)   0.19 (  2%)      82 kB (  0%)
 ipa function summary               :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)    2289 kB (  0%)
 ipa dead code removal              :   0.01 (  0%)   0.00 (  0%)   0.03 (  0%)       0 kB (  0%)
 ipa inheritance graph              :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)      29 kB (  0%)
 ipa virtual call target            :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)       3 kB (  0%)
 ipa cp                             :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)    1140 kB (  0%)
 ipa inlining heuristics            :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)    2438 kB (  0%)
 ipa function splitting             :   0.00 (  0%)   0.01 (  1%)   0.01 (  0%)     451 kB (  0%)
 ipa profile                        :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)       0 kB (  0%)
 ipa pure const                     :   0.02 (  0%)   0.00 (  0%)   0.05 (  0%)      40 kB (  0%)
 ipa icf                            :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)       4 kB (  0%)
 ipa SRA                            :   0.10 (  1%)   0.00 (  0%)   0.05 (  0%)    9838 kB (  1%)
 cfg cleanup                        :   0.08 (  1%)   0.01 (  1%)   0.08 (  1%)    1621 kB (  0%)
 trivially dead code                :   0.03 (  0%)   0.00 (  0%)   0.06 (  1%)       0 kB (  0%)
 df scan insns                      :   0.02 (  0%)   0.01 (  1%)   0.05 (  0%)      18 kB (  0%)
 df multiple defs                   :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)       0 kB (  0%)
 df reaching defs                   :   0.06 (  1%)   0.00 (  0%)   0.04 (  0%)       0 kB (  0%)
 df live regs                       :   0.19 (  2%)   0.01 (  1%)   0.25 (  2%)       0 kB (  0%)
 df live&initialized regs           :   0.05 (  1%)   0.00 (  0%)   0.06 (  1%)       0 kB (  0%)
 df use-def / def-use chains        :   0.03 (  0%)   0.00 (  0%)   0.00 (  0%)       0 kB (  0%)
 df reg dead/unused notes           :   0.08 (  1%)   0.00 (  0%)   0.07 (  1%)    2152 kB (  0%)
 register information               :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)       0 kB (  0%)
 alias analysis                     :   0.03 (  0%)   0.00 (  0%)   0.09 (  1%)    5413 kB (  1%)
 alias stmt walking                 :   0.08 (  1%)   0.00 (  0%)   0.13 (  1%)     738 kB (  0%)
 register scan                      :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     167 kB (  0%)
 rebuild jump labels                :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)       0 kB (  0%)
 preprocessing                      :   0.15 (  2%)   0.21 ( 13%)   0.39 (  4%)   11918 kB (  1%)
 parser (global)                    :   0.29 (  3%)   0.21 ( 13%)   0.51 (  5%)  105494 kB ( 12%)
 parser struct body                 :   0.18 (  2%)   0.04 (  2%)   0.22 (  2%)   39504 kB (  5%)
 parser enumerator list             :   0.01 (  0%)   0.01 (  1%)   0.00 (  0%)    1305 kB (  0%)
 parser function body               :   0.18 (  2%)   0.04 (  2%)   0.15 (  1%)    9096 kB (  1%)
 parser inl. func. body             :   0.27 (  3%)   0.02 (  1%)   0.39 (  4%)   33105 kB (  4%)
 parser inl. meth. body             :   0.21 (  2%)   0.06 (  4%)   0.25 (  2%)   23541 kB (  3%)
 template instantiation             :   1.61 ( 18%)   0.43 ( 26%)   2.05 ( 20%)  346006 kB ( 40%)
 constant expression evaluation     :   0.05 (  1%)   0.03 (  2%)   0.02 (  0%)    1470 kB (  0%)
 early inlining heuristics          :   0.00 (  0%)   0.01 (  1%)   0.03 (  0%)    3751 kB (  0%)
 inline parameters                  :   0.06 (  1%)   0.02 (  1%)   0.05 (  0%)   12991 kB (  1%)
 integration                        :   0.12 (  1%)   0.04 (  2%)   0.26 (  3%)   53810 kB (  6%)
 tree gimplify                      :   0.06 (  1%)   0.02 (  1%)   0.11 (  1%)   20691 kB (  2%)
 tree eh                            :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)    2821 kB (  0%)
 tree CFG construction              :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)    8987 kB (  1%)
 tree CFG cleanup                   :   0.11 (  1%)   0.02 (  1%)   0.13 (  1%)     208 kB (  0%)
 tree tail merge                    :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     880 kB (  0%)
 tree VRP                           :   0.17 (  2%)   0.00 (  0%)   0.18 (  2%)    7001 kB (  1%)
 tree Early VRP                     :   0.05 (  1%)   0.00 (  0%)   0.05 (  0%)    7256 kB (  1%)
 tree copy propagation              :   0.00 (  0%)   0.00 (  0%)   0.05 (  0%)     104 kB (  0%)
 tree PTA                           :   0.13 (  1%)   0.05 (  3%)   0.25 (  2%)    1906 kB (  0%)
 tree PHI insertion                 :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     736 kB (  0%)
 tree SSA rewrite                   :   0.06 (  1%)   0.01 (  1%)   0.04 (  0%)    6289 kB (  1%)
 tree SSA other                     :   0.00 (  0%)   0.02 (  1%)   0.03 (  0%)     940 kB (  0%)
 tree SSA incremental               :   0.08 (  1%)   0.00 (  0%)   0.03 (  0%)    1717 kB (  0%)
 tree operand scan                  :   0.08 (  1%)   0.00 (  0%)   0.08 (  1%)   19096 kB (  2%)
 dominator optimization             :   0.18 (  2%)   0.01 (  1%)   0.15 (  1%)    5240 kB (  1%)
 backwards jump threading           :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     244 kB (  0%)
 tree SRA                           :   0.03 (  0%)   0.00 (  0%)   0.01 (  0%)    1712 kB (  0%)
 tree CCP                           :   0.10 (  1%)   0.02 (  1%)   0.10 (  1%)    1097 kB (  0%)
 tree reassociation                 :   0.00 (  0%)   0.01 (  1%)   0.00 (  0%)      50 kB (  0%)
 tree PRE                           :   0.15 (  2%)   0.01 (  1%)   0.18 (  2%)    4977 kB (  1%)
 tree FRE                           :   0.13 (  1%)   0.02 (  1%)   0.12 (  1%)    2498 kB (  0%)
 tree linearize phis                :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)     563 kB (  0%)
 tree forward propagate             :   0.09 (  1%)   0.00 (  0%)   0.10 (  1%)    1071 kB (  0%)
 tree phiprop                       :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)      11 kB (  0%)
 tree conservative DCE              :   0.04 (  0%)   0.01 (  1%)   0.02 (  0%)     133 kB (  0%)
 tree aggressive DCE                :   0.04 (  0%)   0.01 (  1%)   0.04 (  0%)    7238 kB (  1%)
 tree DSE                           :   0.00 (  0%)   0.01 (  1%)   0.03 (  0%)     254 kB (  0%)
 tree loop invariant motion         :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)      17 kB (  0%)
 scev constant prop                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     112 kB (  0%)
 tree loop unswitching              :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     349 kB (  0%)
 complete unrolling                 :   0.01 (  0%)   0.01 (  1%)   0.03 (  0%)    1141 kB (  0%)
 tree slp vectorization             :   0.01 (  0%)   0.02 (  1%)   0.03 (  0%)    5032 kB (  1%)
 tree iv optimization               :   0.02 (  0%)   0.00 (  0%)   0.04 (  0%)    2110 kB (  0%)
 predictive commoning               :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)     302 kB (  0%)
 gimple CSE reciprocals             :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)       0 kB (  0%)
 dominance computation              :   0.14 (  2%)   0.03 (  2%)   0.16 (  2%)       0 kB (  0%)
 out of ssa                         :   0.05 (  1%)   0.00 (  0%)   0.01 (  0%)      55 kB (  0%)
 expand vars                        :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)    1422 kB (  0%)
 expand                             :   0.03 (  0%)   0.01 (  1%)   0.10 (  1%)   14790 kB (  2%)
 post expand cleanups               :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)    1273 kB (  0%)
 varconst                           :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)       8 kB (  0%)
 jump                               :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)       0 kB (  0%)
 forward prop                       :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)    1330 kB (  0%)
 CSE                                :   0.13 (  1%)   0.00 (  0%)   0.08 (  1%)     664 kB (  0%)
 dead code elimination              :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)       0 kB (  0%)
 dead store elim1                   :   0.02 (  0%)   0.00 (  0%)   0.06 (  1%)    1230 kB (  0%)
 dead store elim2                   :   0.05 (  1%)   0.00 (  0%)   0.03 (  0%)    1584 kB (  0%)
 loop init                          :   0.11 (  1%)   0.02 (  1%)   0.07 (  1%)    8638 kB (  1%)
 loop versioning                    :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)      40 kB (  0%)
 loop invariant motion              :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)       8 kB (  0%)
 CPROP                              :   0.12 (  1%)   0.00 (  0%)   0.06 (  1%)    3321 kB (  0%)
 PRE                                :   0.08 (  1%)   0.00 (  0%)   0.05 (  0%)     935 kB (  0%)
 CSE 2                              :   0.07 (  1%)   0.00 (  0%)   0.08 (  1%)     333 kB (  0%)
 branch prediction                  :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)    1178 kB (  0%)
 combiner                           :   0.21 (  2%)   0.00 (  0%)   0.15 (  1%)    7070 kB (  1%)
 if-conversion                      :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)     464 kB (  0%)
 integrated RA                      :   0.25 (  3%)   0.01 (  1%)   0.30 (  3%)   20626 kB (  2%)
 LRA non-specific                   :   0.10 (  1%)   0.00 (  0%)   0.09 (  1%)    1243 kB (  0%)
 LRA virtuals elimination           :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)     834 kB (  0%)
 LRA reload inheritance             :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     195 kB (  0%)
 LRA create live ranges             :   0.11 (  1%)   0.01 (  1%)   0.13 (  1%)     234 kB (  0%)
 LRA hard reg assignment            :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)       0 kB (  0%)
 LRA rematerialization              :   0.04 (  0%)   0.00 (  0%)   0.02 (  0%)       0 kB (  0%)
 reload                             :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)       0 kB (  0%)
 reload CSE regs                    :   0.09 (  1%)   0.00 (  0%)   0.06 (  1%)    2212 kB (  0%)
 load CSE after reload              :   0.06 (  1%)   0.00 (  0%)   0.05 (  0%)     559 kB (  0%)
 ree                                :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)      71 kB (  0%)
 thread pro- & epilogue             :   0.03 (  0%)   0.00 (  0%)   0.02 (  0%)     939 kB (  0%)
 peephole 2                         :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)     170 kB (  0%)
 hard reg cprop                     :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)      15 kB (  0%)
 scheduling 2                       :   0.15 (  2%)   0.00 (  0%)   0.16 (  2%)     894 kB (  0%)
 machine dep reorg                  :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)     502 kB (  0%)
 reorder blocks                     :   0.04 (  0%)   0.00 (  0%)   0.01 (  0%)    1015 kB (  0%)
 shorten branches                   :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)       0 kB (  0%)
 final                              :   0.04 (  0%)   0.00 (  0%)   0.03 (  0%)    3408 kB (  0%)
 straight-line strength reduction   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)      21 kB (  0%)
 tree loop if-conversion            :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     203 kB (  0%)
 rest of compilation                :   0.10 (  1%)   0.01 (  1%)   0.13 (  1%)    3241 kB (  0%)
 remove unused locals               :   0.02 (  0%)   0.00 (  0%)   0.08 (  1%)       3 kB (  0%)
 address taken                      :   0.04 (  0%)   0.01 (  1%)   0.04 (  0%)       0 kB (  0%)
 TOTAL                              :   8.75          1.63         10.40         874064 kB

Edit I had a few people comment asking for the compiler flags, here they are:

-std=c++17 -Wall -Ofast -DNDEBUG -Wno-deprecated-declarations
cyrusbehr
  • 1,100
  • 1
  • 12
  • 32
  • Maybe try making your code simple and easier for the compiler to compile? – Thomas Matthews Jun 22 '21 at 23:53
  • 1
    What compiler flags are you currently using? – Nate Eldredge Jun 23 '21 at 00:20
  • In almost every performance question on SO about C++, people want to know the compiler flags. (Yes, I know this is compiling time perf). Why not provide them up front? We expect this in all SO C++ perf questions. – Yakk - Adam Nevraumont Jun 23 '21 at 05:16
  • 1
    Seems related: https://stackoverflow.com/questions/373142/what-techniques-can-be-used-to-speed-up-c-compilation-times?rq=1 – prehistoricpenguin Jun 23 '21 at 08:16
  • I added the compiler flags to the bottom of the question @NateEldredge – cyrusbehr Jun 23 '21 at 17:14
  • 1
    It looks to me like most of the lines of the report (from "name lookup" onward) are breaking down the phases from the first few lines. There's no obvious hotspots, except maybe "template instantiation" which suggests the obvious strategy "use less templates"; they are well known to slow down compilation. If you need fast compilation for your development cycle, you could disable optimization (`-O0` instead of `-Ofast`) which should speed up the process a lot; then turn it back on for final testing and release. – Nate Eldredge Jun 23 '21 at 20:28

0 Answers0