Because of vastly better C++ compile times I've recently added the option to compile a project for an ARM Cortex-M4 microcontroller with Clang instead of the arm-none-eabi-gcc toolchain. The whole process ran quite smoothly and I quickly had working ELF and HEX files. It wasn't until yesterday evening that I noticed that the ELF files actually differ quite a lot...
Before I continue let's inspect the ELF produced by GCC to get some kind of baseline.
GCC's ELF contains the following sections (apart from debug stuff)
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .vector_table PROGBITS 08000000 010000 000010 00 A 0 0 4
[ 2] .version NOBITS 08000010 010010 000010 00 WA 0 0 1
[ 3] .text PROGBITS 08000020 010020 000138 00 AX 0 0 4
[ 4] .rodata PROGBITS 08000158 010158 000000 00 WA 0 0 1
[ 5] .data PROGBITS 20000000 010158 000000 00 WA 0 0 1
[ 6] .data2 PROGBITS 10000000 010158 000000 00 W 0 0 1
[ 7] .bss NOBITS 20000000 020000 00038c 00 WA 0 0 512
[ 8] .bss2 PROGBITS 2000038c 010158 000000 00 W 0 0 1
[ 9] ._user_heap_stack NOBITS 2000038c 020000 000e04 00 WA 0 0 1
But despite .data and .bss being marked with an "A" (alloc) flag only loads the following
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010000 0x08000000 0x08000000 0x00158 0x00158 RWE 0x10000
LOAD 0x000000 0x20000000 0x20000000 0x00000 0x01190 RW 0x10000
Section to Segment mapping:
Segment Sections...
00 .vector_table .version .text
01 .bss ._user_heap_stack
So far so good.
The problem emerged when I tried to create binaries from an ELF produced by Clang. Those files where huge in size (256MB) which is nowhere near what I had expected. Now, if you're not familiar with ARM microcontrollers, they usually contain FLASH and RAM memory at very different address locations (e.g. 0x0800'0000 for FLASH and 0x2000'0000 for RAM as seen above). So I already had some suspicion on what's going on... I checked my linker script and put a NOLOAD directive on every section which goes solely to RAM. Problem solved?
Well... not really. In fact my binaries grew even bigger.
Let's take a look at Clang's ELF. It's bugging me a little that Clang doesn't seem to remove the section for unwinding (ARM.exidx) although I compile with -fno-unwind-tables and -gc-sections but ok, I can live with those 16B.
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .vector_table PROGBITS 08000000 001000 000010 00 A 0 0 4
[ 2] .version PROGBITS 08000010 001010 000010 00 A 0 0 1
[ 3] .text PROGBITS 08000020 001020 000334 00 AX 0 0 4
[ 4] .rodata PROGBITS 08000354 001354 000000 00 AX 0 0 1
[ 5] .ARM.exidx ARM_EXIDX 08000354 001354 000010 00 AL 3 0 4
[ 6] .preinit_array PROGBITS 08000364 001364 000000 00 A 0 0 1
[ 7] .init_array INIT_ARRAY 08000364 001364 000004 04 WA 0 0 4
[ 8] .fini_array FINI_ARRAY 08000368 001368 000004 04 WA 0 0 4
[ 9] .data PROGBITS 20000000 002000 000000 00 WA 0 0 1
[10] .data2 PROGBITS 10000000 002000 000000 00 WA 0 0 1
[11] .bss NOBITS 20000000 002000 0001ac 00 WA 0 0 512
[12] .bss2 PROGBITS 200001ac 002000 000000 00 WA 0 0 1
[13] ._user_heap_stack PROGBITS 200001ac 002000 000e04 00 WA 0 0 1
Now this is where it gets interesting and where I have no clue whats happening. What is GNU_RELRO and GNU_STACK and how does it end up there? Why is GNU_STACK at address 0. Any chance this entry is bloating my binaries?
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0x08000000 0x08000000 0x00364 0x00364 R E 0x1000
LOAD 0x001364 0x08000364 0x08000364 0x00008 0x00008 RW 0x1000
GNU_RELRO 0x001364 0x08000364 0x08000364 0x00008 0x00c9c R 0x1
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0
EXIDX 0x001354 0x08000354 0x08000354 0x00010 0x00010 R 0x4
Section to Segment mapping:
Segment Sections...
00 .vector_table .version .text .rodata .ARM.exidx
01 .preinit_array .init_array .fini_array
02 .preinit_array .init_array .fini_array
03
04 .rodata .ARM.exidx
Further questions:
- How is GCC able to remove all the RAM sections despite them previously not having a NOLOAD directive in the linker script?
- Running size on Clang's ELF counts the minimum stack size I define in the linker script to the .data section whereas on GCC's ELF it doesn't. How is that? My linker script contains a section which looks like this
._user_heap_stack (NOLOAD) :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >RAM
- but to my knowledge this should only "check" that there is at least enough RAM left to cover my defined minimum heap and stack size. This section doesn't actually contain anything so how can it be counted to .data?
I know that I could remove unwanted sections with objcopy when actually creating the binaries, but I'd really like to understand those subtle differences between GCC and Clang.
/edit I just noticed that my ._user_heap_stack section has different types depending on the compiler (NOBITS vs PROGBITS). Guess that explains why it's counted to .data...
/edit Now a (potential) bug over @ LLVM https://bugs.llvm.org/show_bug.cgi?id=46299
/edit And closed as of lld 10.0.1 https://bugs.llvm.org/show_bug.cgi?id=46225