gcc LTO: Limit scope of optimization

Question

An LTO build of a rather large shared library (many template instantiations) takes rather long (>10min). Now I know a few things about the library, and could specify some kind of "blacklist" in the form of object files that do not need to be analyzed together (because there are no calls among them that should be inlined or so), or I could specify groups of object files that should be analyzed together. Is this possible somehow (without splitting up the lib)?

You could just not build with `LTO` while developing and only turn it on for a release candidate? — Galik, Feb 27 '18 at 22:07
Repeated local builds are also necessary when analyzing and fixing performance problems. — Martin Richtarsky, Feb 27 '18 at 22:14
I am not sure you would win much. Did you try `-flto=8` (or whatever number or `-flto=jobserver`) to get some parallelism? — Marc Glisse, Mar 06 '18 at 18:23
I'm already using `-flto=40` :) The operation of LTO is described here: https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html There are three phases: LGEN, WPA, LTRANS. WPA partitions the code, and LTRANS then runs in parallel on the partitions. I can see around 15 threads running during the LTRANS phase, but it should be more. I would need to explicitly guide the partitioning of WPA to change that. — Martin Richtarsky, Mar 07 '18 at 20:27

score 5 · Accepted Answer · answered Mar 07 '18 at 21:18

There is a little-used feature of ld called -r/--relocatable that can be used to combine multiple object files into one, that can later be linked into the final product. If one can get LTO to happen here, but not later, you can have the kind of "partial" LTO you're looking for.

Sadly ld -r won't work; it just combines all the LTO information to be processed later. But invoking it via the gcc driver (gcc -r) seems to work:

a.c

int a() {
    return 42;
}

b.c

int a(void);

int b() {
    return a();
}

c.c

int b(void);

int c() {
    return b();
}

d.c

int c(void);

int main() {
    return c();
}

$ gcc -O3 -flto -c [a-d].c
$ gcc -O3 -r -nostdlib a.o b.o -o g1.o
$ gcc -O3 -r -nostdlib c.o d.o -o g2.o
$ gcc -O3 -fno-lto g1.o g2.o
$ objdump -d a.out
...
00000000000004f0 <main>:
 4f0:   e9 1b 01 00 00          jmpq   610 <b>
...
0000000000000610 <b>:
 610:   b8 2a 00 00 00          mov    $0x2a,%eax
 615:   c3                      retq   
...

So main() got optimized to return b();, and b() got optimized to return 42;, but there were no interprocedural optimizations between the two groups.

Thanks for posting this. I've updated my answer as well. – Hadi Brais Mar 08 '18 at 03:50 — Hadi Brais, Mar 08 '18 at 03:50

Hadi Brais · Answer 2 · 2018-03-08T03:44:53.943

3

Assume that you want to optimize a.c and b.c together as one group and c.c and d.c as another group. You can use the -combine GCC switch as follows:

$ gcc -O3 -c -combine a.c b.c -o group1.o
$ gcc -O3 -c -combine c.c d.c -o group2.o

Note that you don't need to use LTO because the -combine switch combines multiple source code files before optimizing the code.

Edit

-combine currently is only supported for C code. An alternative way to achieve this would be using the #include directive as follows:

// file group1.cpp
#include "a.cpp"
#include "b.cpp"

// file group2.cpp
#include "c.cpp"
#include "d.cpp"

Then they can be compiled without using LTO as follows:

g++ -O3 group1.cpp group2.cpp

This effectively emulates grouped or partial LTO.

However, it's not clear whether this technique or the one proposed in another answer is faster to compile. Also the code may not be optimized in the same exact way. So the performance of the resulting code using each technique should be compared. Then the preferred technique can be used.

edited Mar 08 '18 at 03:44

answered Mar 06 '18 at 17:46

Hadi Brais

22,259
3
54
95

1

Thanks for the suggestion. As far as I understand, `combine`only works for C code, not for C++. – Martin Richtarsky Mar 07 '18 at 20:23
@MartinRichtarsky I just checked the manual, you're right. I missed that bit, sorry. Check this [alternative](https://stackoverflow.com/questions/6489627/how-to-use-multiple-source-files-to-create-a-single-object-file-with-gcc) technique. Otherwise, I'm not sure whether `ld -r` works with LTO though. – Hadi Brais Mar 07 '18 at 20:48
1

`ld -r` works with LTO but not really the way OP wants. It seems to just concatenate the LTO information, and the final link will do LTO over the whole program anyway. On the other hand, `gcc -r -nostdlib` seems to do what OP wants. – Tavian Barnes Mar 07 '18 at 20:51
@TavianBarnes What does `gcc -r -nostdlib` do? Does it produce a native object file? Does it work with LTO? – Hadi Brais Mar 07 '18 at 20:54
`gcc -r` is supposed to be a wrapper for `ld -r` I believe, in the same sense that you can do your final link with `gcc` instead of `ld`. The `-nostdlib` is so `gcc` doesn't get confused and pass a bunch of `-lc -lm -lgcc_s` stuff to `ld`. http://shitwefoundout.com/wiki/Combining_object_files – Tavian Barnes Mar 07 '18 at 20:58
@TavianBarnes So you're saying that `gcc -r` will LTO-optimize all the input source files and produce an optimized native object file? Because this is exactly what OP is looking for. – Hadi Brais Mar 07 '18 at 21:02
It seems like it yes. I tested with four files in two groups like you, and checked the disassembly to see that inlining had occurred between {a,b}.o, and between {c,d}.o, but not across the groups. – Tavian Barnes Mar 07 '18 at 21:05
@TavianBarnes Well, if that worked, then you should post an answer and get the bounty. `gcc -r` would be an ideal solution I think. – Hadi Brais Mar 07 '18 at 21:06
1

@HadiBrais Thanks for the answer. I think this will work in principle, but I fear it will cause problems when compiling some of our source files together. But it will be an alternative should `gcc -r` not work for some reason. – Martin Richtarsky Mar 08 '18 at 21:32

score 0 · Answer 3 · answered Mar 05 '18 at 16:40

0

You can exclude object file from link time optimization process completely by just building it without -flto.

answered Mar 05 '18 at 16:40

flapenguin

380
1
3
16

I do not want to fully exclude files, I just want to guide the optimizer with annotations like "optimize these object files together". – Martin Richtarsky Mar 05 '18 at 22:30
AFAIK, you can't do that. You either build object file with structures for LTO (i.e. some gcc bytecode) alongside with regular machine code or without them. – flapenguin Mar 06 '18 at 11:58

gcc LTO: Limit scope of optimization

3 Answers3