15

Normally, one can get GCC's optimized assembler output from a source file using the -S flag in GCC and Clang, as in the following example.

gcc -O3 -S -c -o foo.s foo.c

But suppose I compile all of my source files using -O3 -flto to enable link-time whole-program optimizations and want to see the final compiler-generated optimized assembly for a function, and/or see where/how code gets inlined.

The result of compiling is a bunch of .o files which are really IR files disguised as object files, as expected. In linking an executable or shared library, these are then smushed together, optimized as a whole, and then compiled into the target binary.

But what if I want assembly output from this procedure? That is, the assembly source that results after link-time optimizations, during the compilation of IR to assembly, and before the actual assembly and linkage into the final executable.

I tried simply adding a -S flag to the link step, but that didn't really work.

I know disassembling the executable is possible, even interleaving with source, but sometimes it's nicer to look at actual compiler-generated assembly, especially with -fverbose-asm.

Mona the Monad
  • 2,265
  • 3
  • 19
  • 30
  • You can always disassemble; with `-g` there are labels on every function, and block -> source-line-number debug info that lets `objdump -drwC -S -l` [interleave disassembly with source](https://stackoverflow.com/questions/1289881/using-gcc-to-produce-readable-assembly). Worth a try, IDK if that works. Not as nice as `gcc -S -fverbose-asm` to have named outputs, though. – Peter Cordes Oct 31 '19 at 07:39
  • Disassembly is what I currently do, but I was wondering if there is any way to do it without needing to do that :c (i.e. a big fat assembly file that, itself, can be assembled into the final binary) – Mona the Monad Oct 31 '19 at 12:25
  • In the LLVM case, would `llvm-link`'ing all of the "object" files together before passing `-S` work? – Mona the Monad Oct 31 '19 at 12:31
  • I don't know, that's why I upvoted the question and posted a workaround as a comment! – Peter Cordes Oct 31 '19 at 12:54
  • I don't have Clang on me at the moment to test it, but I think it would work. However, that's LLVM-specific, and I want to find a unified solution, if possible. – Mona the Monad Oct 31 '19 at 18:53
  • Did you ever figure this one out, I'm trying to get ASM output in godbolt with clang but -flto messes it up. – David Ledger Apr 06 '20 at 02:34

1 Answers1

5

For GCC just add -save-temps to linker command:

$ gcc -flto -save-temps ... *.o -o bin/libsortcheck.so
$ ls -1
...
libsortcheck.so.ltrans0.s

For Clang the situation is more complicated. In case you use GNU ld (default or -fuse-ld=ld) or Gold linker (enabled via -fuse-ld=gold), you need to run with -Wl,-plugin-opt=emit-asm:

$ clang tmp.c -flto -Wl,-plugin-opt=emit-asm -o tmp.s

For newer (11+) versions of LLD linker (enabled via -fuse-ld=lld) you can generate asm with -Wl,--lto-emit-asm.

yugr
  • 19,769
  • 3
  • 51
  • 96
  • Doesn't seem to work for me on `clang`, but it seems to be getting there. Adding that option seems to make it emit `.bc` and `.ii` for individual sources, but nothing yet for the actual binary. The closest I got was with `-Wl,-plugin-opt=save-temps`, which output `*.{preopt,internalize,opt,precodegen}.bc` files and a `*.lto.o`, but no assembly source. – Mona the Monad Dec 28 '21 at 20:06
  • 1
    @MonatheMonad thanks for the feedback, for some reason I thought the question was GCC-only. Please check the updated answer. – yugr Dec 29 '21 at 08:00
  • 2
    Any final binaries built from the resulting monolithic assembly file should be effectively identical to ones built "normally" (i.e. without `-Wl,-plugin-opt=emit-asm`), as there is nothing left to do but assembly and linkage, right? I want to make this assembly generation a side-effect of some debug builds, and I wonder whether a simple `clang -shared -o foo.so foo.so.s` would suffice. – Mona the Monad Dec 31 '21 at 18:54
  • 2
    @MonatheMonad this is definitely true for GCC - if you look at `strace` you can see that it actually assembles the exact same ltrans0.s file which is generated under `-save-temps`. For Clang the situation is less clear, I'll look into this later. – yugr Jan 01 '22 at 09:38
  • 3
    @MonatheMonad judging from the code for Clang situation seems to be the same but I'd do some verification to be sure (e.g. compare binaries for both LTO flows). – yugr Jan 01 '22 at 10:00