2

I am working on software running on an embedded ARM platform. In the course of updating our platform, we are switching from an OpenEmbedded based system to Linaro.

On my machine, it currently takes about 9 minutes to cross-compile our software for ARM, using the 32 bit gcc 4.6.4 that OpenEmbedded built for us. For the new system we are now of course trying Linaro's gcc 4.7 binary - with the surprising result that compilation suddenly takes about twice as long (18 minutes). The Linaro gcc 4.6 binary has the same issue, so it is not gcc version specific.

Using Linaro's crosstool-ng to create an adjusted version of their compiler (e.g. trying to get the configure options as close as possible) did not speed it up.

The main differences between our old gcc compiler and the Linaro one:

  • old one uses softfp, Linaro hard and specifies the fpu
  • old one targets no particular ARM processor/architecture (arm-none-linux-gnueabi), Linaro's gcc (arm-linux-gnueabihf) has with-target=armv7-a and with-tune=cortex-a9 explicitely set

Changing configure options in gcc like enabling of ssp, thumb/arm mode, using multilib, target CPU (cortex-a8 vs a9) does not yield an improvement.

Performance speed already differs for a simple test.cpp that just has a main function with a vector<int>, so it's not related to the linking and I doubt that the STL header files are causing that much difference.

I am running out of ideas what else to tweak. Does anybody have an idea?


EDIT4: I also tried the arm cross compiler from Ubuntu 12.04 (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)) and it has comparable compilation times to my 4.6.4 version. So there seems to be something particular different in Linaro's version which I either can't manage to turn off or is some special patch they applied?


EDIT3: -ftime-report from Linaro gcc 4.7 for an actual source file from the project:

Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall    2076 kB ( 2%) ggc
 phase parsing           :   1.72 (74%) usr   0.34 (85%) sys   2.04 (75%) wall   66732 kB (79%) ggc
 phase lang. deferred    :   0.28 (12%) usr   0.04 (10%) sys   0.33 (12%) wall   10215 kB (12%) ggc
 phase cgraph            :   0.32 (14%) usr   0.02 ( 5%) sys   0.33 (12%) wall    5481 kB ( 6%) ggc
 phase generate          :   0.60 (26%) usr   0.06 (15%) sys   0.66 (24%) wall   15700 kB (19%) ggc
 |name lookup            :   0.28 (12%) usr   0.02 ( 5%) sys   0.24 ( 9%) wall    8058 kB (10%) ggc
 |overload resolution    :   0.32 (14%) usr   0.06 (15%) sys   0.36 (13%) wall   10042 kB (12%) ggc
 callgraph construction  :   0.06 ( 3%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall     551 kB ( 1%) ggc
 callgraph optimization  :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     224 kB ( 0%) ggc
 varpool construction    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      94 kB ( 0%) ggc
 df scan insns           :   0.06 ( 3%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall       5 kB ( 0%) ggc
 df reg dead/unused notes:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      22 kB ( 0%) ggc
 alias analysis          :   0.00 ( 0%) usr   0.02 ( 5%) sys   0.00 ( 0%) wall      11 kB ( 0%) ggc
 preprocessing           :   0.08 ( 3%) usr   0.10 (25%) sys   0.29 (11%) wall    1069 kB ( 1%) ggc
 parser (global)         :   0.58 (25%) usr   0.08 (20%) sys   0.43 (16%) wall   25145 kB (30%) ggc
 parser struct body      :   0.28 (12%) usr   0.02 ( 5%) sys   0.34 (12%) wall   12400 kB (15%) ggc
 parser enumerator list  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     121 kB ( 0%) ggc
 parser function body    :   0.14 ( 6%) usr   0.00 ( 0%) sys   0.15 ( 6%) wall    2435 kB ( 3%) ggc
 parser inl. func. body  :   0.10 ( 4%) usr   0.02 ( 5%) sys   0.18 ( 7%) wall    3682 kB ( 4%) ggc
 parser inl. meth. body  :   0.24 (10%) usr   0.02 ( 5%) sys   0.20 ( 7%) wall    5298 kB ( 6%) ggc
 template instantiation  :   0.58 (25%) usr   0.14 (35%) sys   0.75 (28%) wall   26588 kB (31%) ggc
 tree gimplify           :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.03 ( 1%) wall     785 kB ( 1%) ggc
 tree CFG construction   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     543 kB ( 1%) ggc
 tree SSA other          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall      32 kB ( 0%) ggc
 out of ssa              :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 expand                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     438 kB ( 1%) ggc
 varconst                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       6 kB ( 0%) ggc
 integrated RA           :   0.04 ( 2%) usr   0.00 ( 0%) sys   0.09 ( 3%) wall    1313 kB ( 2%) ggc
 reload                  :   0.06 ( 3%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall      60 kB ( 0%) ggc
 thread pro- & epilogue  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall      92 kB ( 0%) ggc
 final                   :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall       4 kB ( 0%) ggc
 rest of compilation     :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall     133 kB ( 0%) ggc
 unaccounted todo        :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 :   2.32             0.40             2.72              84519 kB

and the same for my gcc-4.6:

Execution times (seconds)
 callgraph construction:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall     527 kB ( 1%) ggc
 trivially dead code   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 df scan insns         :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       5 kB ( 0%) ggc
 df reg dead/unused notes:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall      22 kB ( 0%) ggc
 preprocessing         :   0.08 ( 7%) usr   0.10 (26%) sys   0.14 ( 9%) wall    1016 kB ( 1%) ggc
 parser                :   0.68 (58%) usr   0.24 (63%) sys   0.83 (52%) wall   52215 kB (76%) ggc
 name lookup           :   0.28 (24%) usr   0.02 ( 5%) sys   0.41 (26%) wall   10211 kB (15%) ggc
 inline heuristics     :   0.00 ( 0%) usr   0.02 ( 5%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree gimplify         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     637 kB ( 1%) ggc
 tree CFG construction :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     463 kB ( 1%) ggc
 expand                :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     426 kB ( 1%) ggc
 varconst              :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall     132 kB ( 0%) ggc
 integrated RA         :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.03 ( 2%) wall     304 kB ( 0%) ggc
 reload                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall      58 kB ( 0%) ggc
 machine dep reorg     :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       3 kB ( 0%) ggc
 final                 :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall       4 kB ( 0%) ggc
 rest of compilation   :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     133 kB ( 0%) ggc
 unaccounted todo      :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.03 ( 2%) wall       0 kB ( 0%) ggc
 TOTAL                 :   1.18             0.38             1.59              68315 kB

EDIT2: Linaro gcc 4.6.3's -ftime-report output for a VERY simple test.cpp (including options -fno-graphite-identity -fno-graphite):

Execution times (seconds)
 preprocessing         :   0.00 ( 0%) usr   0.02 (50%) sys   0.02 (10%) wall     121 kB ( 2%) ggc
 parser                :   0.10 (62%) usr   0.02 (50%) sys   0.11 (55%) wall    4022 kB (65%) ggc
 name lookup           :   0.02 (12%) usr   0.00 ( 0%) sys   0.04 (20%) wall     879 kB (14%) ggc
 tree gimplify         :   0.02 (13%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      20 kB ( 0%) ggc
 expand                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 5%) wall      34 kB ( 1%) ggc
 integrated RA         :   0.02 (12%) usr   0.00 ( 0%) sys   0.01 ( 5%) wall      59 kB ( 1%) ggc
 TOTAL                 :   0.16             0.04             0.20               6207 kB

and for the same file with my old gcc 4.6.4:

Execution times (seconds)
 preprocessing         :   0.02 (25%) usr   0.00 ( 0%) sys   0.02 (14%) wall     119 kB ( 2%) ggc
 parser                :   0.00 ( 0%) usr   0.04 (100%) sys   0.06 (43%) wall    4021 kB (65%) ggc
 name lookup           :   0.04 (50%) usr   0.00 ( 0%) sys   0.03 (21%) wall     879 kB (14%) ggc
 expand                :   0.02 (25%) usr   0.00 ( 0%) sys   0.01 ( 7%) wall      34 kB ( 1%) ggc
 unaccounted todo      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 7%) wall       0 kB ( 0%) ggc
 TOTAL                 :   0.08             0.04             0.14               6204 kB

Generating the preprocessed file with both compilers yielded no significant difference (output of Linaro's gcc was but 3 lines longer).


EDIT1: gcc -v for the old one (path shortened or removed (e.g. --sbindir)

# arm-none-linux-gnueabi-g++ -v
Using built-in specs.
COLLECT_GCC=..sysroots/i686-linux/usr/bin/arm-none-linux-gnueabi-g++
COLLECT_LTO_WRAPPER=../libexec/gcc/arm-none-linux-gnueabi/4.6.4/lto-wrapper
Target: arm-none-linux-gnueabi
Configured with: ..tmp/work/armv7a-none-linux-gnueabi/gcc-cross-4.6.3+svnr184847-r27/gcc-4_6-branch/configure --build=i686-linux --host=i686-linux --target=arm-none-linux-gnueabi  --with-gnu-ld --enable-shared --enable-languages=c,c++ --enable-threads=posix --disable-multilib --enable-c99 --enable-long-long --enable-symvers=gnu --enable-libstdcxx-pch --program-prefix=arm-none-linux-gnueabi- --without-local-prefix --enable-lto --enable-libssp --disable-bootstrap --disable-libgomp --disable-libmudflap --with-system-zlib --with-linker-hash-style=gnu --with-ppl=no --with-cloog=no --enable-cheaders=c_global --enable-languages=c,c++,fortran --disable-libunwind-exceptions --with-mpfr=..sysroots/i686-linux/usr --with-system-zlib --enable-__cxa_atexit
Thread model: posix
gcc version 4.6.4 20120303 (prerelease) (GCC) 

and Linaro gcc -v

# arm-linux-gnueabihf-g++ -v
Using built-in specs.
COLLECT_GCC=..compiler/bin/arm-linux-gnueabihf-g++
COLLECT_LTO_WRAPPER=../libexec/gcc/arm-linux-gnueabihf/4.7.2/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: .build/src/gcc-linaro-4.7-2012.08/configure --build=i686-build_pc-linux-gnu --host=i686-build_pc-linux-gnu --target=arm-linux-gnueabihf --enable-languages=c,c++,fortran --enable-multilib --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard --with-pkgversion='crosstool-NG linaro-1.13.1-2012.08-20120827 - Linaro GCC 2012.08' --with-bugurl=https://bugs.launchpad.net/gcc-linaro --enable-__cxa_atexit --enable-libmudflap --enable-libgomp --enable-libssp --with-gmp=.. --with-mpfr=.. --with-mpc=.. --with-ppl=.. --with-cloog=.. --with-libelf=.. --with-host-libstdcxx='-L.. -lpwl' --enable-threads=posix --disable-libstdcxx-pch --enable-linker-build-id --enable-gold --with-local-prefix=.. --enable-c99 --enable-long-long --with-mode=thumb
Thread model: posix
gcc version 4.7.2 20120731 (prerelease) (crosstool-NG linaro-1.13.1-2012.08-20120827 - Linaro GCC 2012.08) 

For the latter I also made adjustments to have --disable-multilib --disable-libmudflap --disable-libgomp --disable-multilib.

And here's Ubuntu 12.04's arm compiler:

> arm-linux-gnueabihf-g++-4.6 -v
Using built-in specs.
COLLECT_GCC=arm-linux-gnueabihf-g++-4.6
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5'  --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/arm-linux-gnueabihf/include/c++/4.6.3 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=arm-linux-gnueabihf 
Frankie
  • 653
  • 1
  • 9
  • 20

2 Answers2

1

You could pass -time or -ftime-report to the gcc compiler to find out why and where gcc is taking compilation time.

But why does the compilation time matters so much to you?

You should take care of the execution time of the produced executable binary.

Also, show us the output of the -v option passed to your gcc

And you might pass the -j option to your make command to have it work in parallel (e.g. running several gcc in parallel). You could also lower the optimization level, e.g. from -O3 to -O2 or -O1

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Great suggestion. In fact, it's probably the *only* way to figure out where the difference might be coming from – paulsm4 Sep 15 '12 at 06:17
  • I care about compilation time as this is a non-profit project (university students) and the 8vs18 minutes is on my machine. Some of the students already run a compilation speed of 1 hour on their older machines and they would not be happy if my next system upgrade changes that to 2:20 hours :-) – Frankie Sep 15 '12 at 06:30
  • I already do `make -j`. I added the gcc -v outputs in my question. The `-ftime-report` option causes an internal compiler error, the `-time` gives me `cc1plus 2.36 0.15 \n as 0.00 0.00` for the old, and the new one gave `# cc1plus 5.00 0.12 \n as 0.02 0.00`. – Frankie Sep 15 '12 at 06:42
  • If `-ftime-report` gives you an internal compiler error, please report a bug to GCC. Try also to compile GCC 4.7.1 from GNU source code released on http://gcc.gnu.org – Basile Starynkevitch Sep 15 '12 at 07:37
  • Limiting the compiler flags makes the time report work. For better comparison I also used Linaro's gcc 4.6 compiler. I added the output for a simple test.cpp in the question, strangely most of the additional time seems to be spent in the parser. – Frankie Sep 15 '12 at 07:52
  • But you should use `-ftime-report` on some relevant source file of your application, not a `helloworld` one! If that crashes, it is a GCC bug that you should report (and that might be have been fixed since). – Basile Starynkevitch Sep 15 '12 at 07:54
  • If I use a relevant source file, the Linaro gcc 4.6 compiler is having a problem with some pthread related includes (maybe because the runtime does not match?), and the gcc 4.7 compiler has an updated listing that does not match the 4.6 version and makes it harder to compare. But if the issue is already seen in a simple test file, this issue does not seem to be specific to my code anyway. – Frankie Sep 15 '12 at 08:00
  • I added the output of an actual source file as EDIT3 – Frankie Sep 15 '12 at 08:39
0

OK - you -ftime-report tests clearly show "parser" is the culprit; I'm guessing templates (in general) and STL (in particular) are the root cause.

SUGGESTION:

See if there's any way you can use "precompiled headers" in your tool chain. If you can, that might eliminate the entire problem.

LINKS (unfortunately, I'm not sure which may or may not be applicable to you):

Community
  • 1
  • 1
paulsm4
  • 114,292
  • 17
  • 138
  • 190
  • My project already runs with precompiled headers. Turning them off or on makes no significant difference in the comparative compilation speeds. Remember, my question is about Linaro's gcc version being only half as fast as a more generic gcc cross compiler. – Frankie Sep 16 '12 at 07:50
  • And remember, your profiling says "parser". Which, to me, implies "templates", "templates in headers" ... and, yes, "STL templates in STL headers". Q: Are you sure your .pch includes the system headers you're using (including STL)? Q: Are you sure the compiler is configured to use the pch's? And remember: "Linaro slower" appears to be an unfortunate Fact of Life. The *real* question is "how can we speed it up?" IMHO... – paulsm4 Sep 16 '12 at 17:46
  • I use the exact same Makefile for both compilers (just swapping out the compiler name). If I speed up Linaro's gcc in any generic way, this would also speed up the other gcc's speed and my question would still be open - what is it that Linaro's gcc version does (in parsing apparently) that the other gcc does not? – Frankie Sep 16 '12 at 20:14