Gcc can profile itself, and with this you can see which part of the compilation process takes the longest.
A sample output:
Time variable usr sys wall GGC
phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1369 kB ( 0%)
phase parsing : 5.76 ( 72%) 2.38 ( 87%) 9.27 ( 78%) 554966 kB ( 80%)
phase lang. deferred : 0.50 ( 6%) 0.16 ( 6%) 0.67 ( 6%) 62109 kB ( 9%)
phase opt and generate : 1.58 ( 20%) 0.18 ( 7%) 1.78 ( 15%) 66512 kB ( 10%)
phase last asm : 0.14 ( 2%) 0.02 ( 1%) 0.15 ( 1%) 4587 kB ( 1%)
phase finalize : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
|name lookup : 0.90 ( 11%) 0.36 ( 13%) 1.71 ( 14%) 17506 kB ( 3%)
|overload resolution : 0.78 ( 10%) 0.24 ( 9%) 1.17 ( 10%) 68510 kB ( 10%)
garbage collection : 0.58 ( 7%) 0.00 ( 0%) 0.79 ( 7%) 0 kB ( 0%)
dump files : 0.07 ( 1%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
callgraph construction : 0.31 ( 4%) 0.02 ( 1%) 0.29 ( 2%) 26559 kB ( 4%)
callgraph optimization : 0.03 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 10 kB ( 0%)
ipa function summary : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 412 kB ( 0%)
ipa inlining heuristics : 0.01 ( 0%) 0.01 ( 0%) 0.01 ( 0%) 282 kB ( 0%)
ipa pure const : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 26 kB ( 0%)
cfg cleanup : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 1 kB ( 0%)
trivially dead code : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 2 kB ( 0%)
df scan insns : 0.00 ( 0%) 0.01 ( 0%) 0.01 ( 0%) 2 kB ( 0%)
df live regs : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 24 kB ( 0%)
df reg dead/unused notes : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 172 kB ( 0%)
register information : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
alias stmt walking : 0.03 ( 0%) 0.01 ( 0%) 0.00 ( 0%) 241 kB ( 0%)
rebuild jump labels : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
preprocessing : 0.66 ( 8%) 0.61 ( 22%) 1.53 ( 13%) 45104 kB ( 7%)
parser (global) : 0.78 ( 10%) 0.59 ( 22%) 1.47 ( 12%) 107059 kB ( 16%)
parser struct body : 0.77 ( 10%) 0.18 ( 7%) 1.00 ( 8%) 64460 kB ( 9%)
parser enumerator list : 0.05 ( 1%) 0.02 ( 1%) 0.07 ( 1%) 2628 kB ( 0%)
parser function body : 0.20 ( 3%) 0.10 ( 4%) 0.35 ( 3%) 9952 kB ( 1%)
parser inl. func. body : 0.35 ( 4%) 0.19 ( 7%) 0.62 ( 5%) 25224 kB ( 4%)
parser inl. meth. body : 1.20 ( 15%) 0.28 ( 10%) 1.49 ( 13%) 110313 kB ( 16%)
template instantiation : 1.60 ( 20%) 0.48 ( 18%) 2.55 ( 21%) 172942 kB ( 25%)
constant expression evaluation : 0.10 ( 1%) 0.05 ( 2%) 0.08 ( 1%) 1091 kB ( 0%)
early inlining heuristics : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 292 kB ( 0%)
inline parameters : 0.01 ( 0%) 0.01 ( 0%) 0.03 ( 0%) 2592 kB ( 0%)
integration : 0.14 ( 2%) 0.08 ( 3%) 0.11 ( 1%) 8382 kB ( 1%)
tree gimplify : 0.01 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 3581 kB ( 1%)
tree eh : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 373 kB ( 0%)
tree CFG construction : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1628 kB ( 0%)
tree CFG cleanup : 0.01 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 10 kB ( 0%)
tree SSA other : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 183 kB ( 0%)
tree SSA incremental : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 100 kB ( 0%)
tree operand scan : 0.07 ( 1%) 0.00 ( 0%) 0.07 ( 1%) 2924 kB ( 0%)
tree CCP : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 118 kB ( 0%)
tree FRE : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 363 kB ( 0%)
tree forward propagate : 0.01 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 112 kB ( 0%)
tree aggressive DCE : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 152 kB ( 0%)
tree DSE : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 28 kB ( 0%)
PHI merge : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 4 kB ( 0%)
dominance computation : 0.00 ( 0%) 0.01 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
expand vars : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 153 kB ( 0%)
expand : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 3103 kB ( 0%)
varconst : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 6 kB ( 0%)
forward prop : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 176 kB ( 0%)
CSE : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 15 kB ( 0%)
dead store elim1 : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 117 kB ( 0%)
dead store elim2 : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 149 kB ( 0%)
loop init : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 647 kB ( 0%)
branch prediction : 0.01 ( 0%) 0.02 ( 1%) 0.02 ( 0%) 229 kB ( 0%)
combiner : 0.03 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 96 kB ( 0%)
integrated RA : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 1862 kB ( 0%)
LRA non-specific : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 62 kB ( 0%)
LRA create live ranges : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 10 kB ( 0%)
reload CSE regs : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 101 kB ( 0%)
thread pro- & epilogue : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 131 kB ( 0%)
hard reg cprop : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 8 kB ( 0%)
machine dep reorg : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 7 kB ( 0%)
reg stack : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
final : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 1906 kB ( 0%)
symout : 0.44 ( 6%) 0.07 ( 3%) 0.49 ( 4%) 87737 kB ( 13%)
variable tracking : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 901 kB ( 0%)
var-tracking dataflow : 0.08 ( 1%) 0.00 ( 0%) 0.08 ( 1%) 34 kB ( 0%)
var-tracking emit : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 1180 kB ( 0%)
initialize rtl : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 12 kB ( 0%)
rest of compilation : 0.05 ( 1%) 0.00 ( 0%) 0.01 ( 0%) 289 kB ( 0%)
remove unused locals : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 1 kB ( 0%)
address taken : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%)
TOTAL : 7.98 2.74 11.90 689554 kB
I would like to know other things however. For example which files are taking the longest to compile? Which functions. And in particular, it seems my compilation bottleneck is template instantiation. I would like to know which templates exactly are taking the longest.
I tried looking this up but all I find is documentation on how to generate the above table.
The table is generated by adding -ftime-report
to the g++ flags.