Of whose task does data-structure alignment correspond to? Is it the compiler, the linker, the loader or the hardware itself, like in case of x86? Does the compiler do relative aligned addressing, such that when 'placed' correctly by the linker in the compiled executable, the data-structures are always aligned to respective native-size boundaries? What more tasks does the loader have to do hereafter?
3 Answers
The answer is that both the compiler and linker0 need to understand and handle alignment requirements. The compiler is the smart one of the pair as only it understands the actual structure, stack and variable alignment rules - but it propagates some information about required alignment to the linker which also needs to respect it when generating the final executable.
The compiler takes care of a lot of runtime alignment handling and, conversely, also often relies on the fact that certain minimum alignments are met1. The existing answers here cover what the compiler does in some details.
What is missing is that the linker and loader framework also deal with alignment. Generally speaking each section has a minimum alignment attribute, and the linker writes that attribute and the loader respects it, ensuring that the section is loaded on a boundary at least as aligned as that attribute.
Different sections will have different requirements, and changes to the code can affect those directly. A simple example is global data, whether it is in the .bss
, .rodata
, .data
or some other section. These sections will have an alignment at least as large as the largest alignment requirement for any object stored therein.
So if you have a read-only (const
) global object with 64-byte alignment, your .rodata
section will have a minimum alignment of 64-bytes, and the linker will ensure this requirement is met.
You can use objdump -h
to see the actual alignment requirements of any object file in the Algn
column. Here's a random example:
Sections:
Idx Name Size VMA LMA File off Algn Flags
0 .interp 0000001c 0000000000400238 0000000000400238 00000238 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.ABI-tag 00000020 0000000000400254 0000000000400254 00000254 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .note.gnu.build-id 00000024 0000000000400274 0000000000400274 00000274 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .gnu.hash 00000030 0000000000400298 0000000000400298 00000298 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .dynsym 00000288 00000000004002c8 00000000004002c8 000002c8 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .dynstr 00000128 0000000000400550 0000000000400550 00000550 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .gnu.version 00000036 0000000000400678 0000000000400678 00000678 2**1 CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .gnu.version_r 00000050 00000000004006b0 00000000004006b0 000006b0 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA
8 .rela.dyn 00000060 0000000000400700 0000000000400700 00000700 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .rela.plt 00000210 0000000000400760 0000000000400760 00000760 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA
10 .init 0000001a 0000000000400970 0000000000400970 00000970 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE
11 .plt 00000170 0000000000400990 0000000000400990 00000990 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE
12 .plt.got 00000008 0000000000400b00 0000000000400b00 00000b00 2**3 CONTENTS, ALLOC, LOAD, READONLY, CODE
13 .text 000021e2 0000000000400b10 0000000000400b10 00000b10 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE
14 .fini 00000009 0000000000402cf4 0000000000402cf4 00002cf4 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE
15 .rodata 00000700 0000000000402d00 0000000000402d00 00002d00 2**5 CONTENTS, ALLOC, LOAD, READONLY, DATA
16 .eh_frame_hdr 000000b4 0000000000403400 0000000000403400 00003400 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA
17 .eh_frame 000003d4 00000000004034b8 00000000004034b8 000034b8 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA
18 .init_array 00000008 0000000000603e10 0000000000603e10 00003e10 2**3 CONTENTS, ALLOC, LOAD, DATA
19 .fini_array 00000008 0000000000603e18 0000000000603e18 00003e18 2**3 CONTENTS, ALLOC, LOAD, DATA
20 .jcr 00000008 0000000000603e20 0000000000603e20 00003e20 2**3 CONTENTS, ALLOC, LOAD, DATA
21 .dynamic 000001d0 0000000000603e28 0000000000603e28 00003e28 2**3 CONTENTS, ALLOC, LOAD, DATA
22 .got 00000008 0000000000603ff8 0000000000603ff8 00003ff8 2**3 CONTENTS, ALLOC, LOAD, DATA
23 .got.plt 000000c8 0000000000604000 0000000000604000 00004000 2**3 CONTENTS, ALLOC, LOAD, DATA
24 .data 00000020 00000000006040d0 00000000006040d0 000040d0 2**4 CONTENTS, ALLOC, LOAD, DATA
25 .bss 000001c8 0000000000604100 0000000000604100 000040f0 2**5 ALLOC
26 .comment 00000034 0000000000000000 0000000000000000 000040f0 2**0 CONTENTS, READONLY
The alignment requirements here vary from 2**0
(no alignment needed) to 2**5
(align on a 32-byte boundary).
Beyond the candidates you mentioned, the runtime also needs to be alignment aware. This topic is somewhat complex, but basically you can be sure that malloc
and related functions return memory suitable aligned for any fundamental type (which usually just means 8-byte aligned on 64-bit systems), although things get more complicated when you are talking about over-aligned types, or C++ alignas
.
0 I had originally just grouped the (compile-time) linker and (runtime) loader together as they are really two sides of the same coin (and indeed much of the linking is actually runtime linking). After looking more carefully into the loading process, however, it seems that the loader may just load the segments (sections) at their existing file offsets, automatically respecting the alignment set up by the linker.
1 Less so on platforms like x86 where unaligned access is usually allowed, but on platforms with alignment restrictions are stricter, code may actually fail if incorrect alignment is encountered.
The shortest answer that I believe to be correct: it's the compiler's job.
This is why there are various #pragma
s and other compiler-level magic knobs you can twist to control alignment, when you must.
I don't think the C language specifies the existance of those other components (linker and loader).

- 391,730
- 64
- 469
- 606
-
1Well, linking is hinted at in the C11 standard, even the word linker is mentioned in a footnote; and other mentions like "Translation units may be separately translated and then later linked to produce an executable program." - of course the standard doesn't mandate that a separate linker must exist. – Antti Haapala -- Слава Україні Feb 24 '17 at 09:52
-
It is understandable, that the compiler designers cannot say about the linker, much less the loader, but the actual mechanics, could I say depends on the implementation, which includes the linker, and more natively the loader? – user2338150 Feb 24 '17 at 09:56
-
1They don't need to be mentioned to be alignment-aware: the standard says how things have to work, then the toolchain implements it in a way that makes sense. In most cases, this means that the entire toolchain, from the compiler to the linker through to the loader/runtime linkers all need to understand alignment. Indeed, the standard doesn't even require a linker or a compiler (you could have conformed interpreted C, for example), but in practice that's exactly how it is implemented on most platforms. – BeeOnRope Feb 27 '17 at 02:45
Data alignment is tightly coupled with code generation.
Consider all the burden of generating the prologue and epilogue of a function that has local variables aligned on some boundary[live example].
The two codes below are generated from the same function but with different alignment (32B the left one, 4B the right one)
foo(double): foo(double):
push ebp lea ecx, [esp+4]
mov ebp, esp and esp, -8
sub esp, 40 push DWORD PTR [ecx-4]
mov eax, DWORD PTR [ebp+8] push ebp
mov DWORD PTR [ebp-40], eax mov ebp, esp
mov eax, DWORD PTR [ebp+12] push ecx
mov DWORD PTR [ebp-36], eax sub esp, 20
fld1 mov eax, ecx
fstp QWORD PTR [ebp-8] fld1
fld QWORD PTR [ebp-40] fstp QWORD PTR [ebp-16]
fstp QWORD PTR [ebp-16] fld QWORD PTR [eax]
fld QWORD PTR [ebp-8] fstp QWORD PTR [ebp-24]
fmul QWORD PTR [ebp-16] fld QWORD PTR [ebp-16]
leave fmul QWORD PTR [ebp-24]
ret add esp, 20
pop ecx
pop ebp
lea esp, [ecx-4]
ret
While this example refers to the alignment of the stack, its purpose is to show the complications that arose.
Structure alignment works the same.
In order to defer this responsibility to the linker, the compiler would have to generate ad-hoc code and a lot of meta-data so that the linker could patch the necessary instructions.
Accommodating the limited linker interface would lead to the generation of sub-optimal code.
Enriching the linker capabilities would shift the compiler-linker boundary to the left, effectively making the latter "sorta" a small compiler.
The loader has no means on a program data - it has to load any program regardless of how they access their data, trying to treat code and data as opaque as possible.
Particularly, the loader usually fills or rewrites the executable meta-data but not the code nor the data.
Making the code going through meta-data every time it reads a struct field would be a huge performance kill for no rationale at all.
The hardware has no concept of structures nor of the intentions of the program.
When instructed to read from X it will do its best to read from X as fast and correctly as possible but it will assign no meaning to that X.
The hardware does what it is told to do.
If it can't, the condition is signalled. The x86 architecture has very relaxed alignment requirement at a cost of potentially doubling (or worst) the latency of the operation.
The compiler takes the full responsibility for aligning the data.
The two lemmas that come handy when doing so are1:
If and object a is X-aligned with respect to a Y-aligned object b and X | Y (Y is a multiple of X) then a is X-aligned with respect to the same reference of b.
For example, the sections in a PE/ELF file (and somewhat even
malloc
d buffers) can be loaded aligned at a specific boundary (8 bytes, 16 bytes, 4KiB and so on).
If a section is loaded aligned at 4KiB then all the power-of-two alignments up to 212 are automatically respected, once in memory, even if they are taken with respect to the start of the section, no matter where the section is loaded.In a buffer B of length 2X-1 there is at least one address A that is X-aligned and such that 2X-1 - (A-B) >= X (it has enough space to hold an object of size X).
If you need to align an object at 8-bytes boundary and that object is 8-bytes in length (as usually is) then allocating a buffer of 16-1 = 15 bytes will guarantee that a suitable address is present for every possible start address of the buffer.
Thanks to these two lemmas and an established convention with the loader, the compiler can fulfil its duties without reaching out to other tools.
1 given without too much explanation.

- 41,768
- 5
- 78
- 124
-
What can I say, it's the best explanation on the topic... I guess I would need more experience in debugging to understand whole of it... Thank you for such awesome explanation. – user2338150 Feb 24 '17 at 19:30
-
This isn't totally correct - the linker and loader (aka runtime linker) also has to deal in alignment: otherwise how could alignment for global data be respected? – BeeOnRope Feb 27 '17 at 02:35
-
@BeeOnRope The ELF/PE file format specify the alignment of the sections and the loader has to respect it. And the linker has similar duties when merging two object files. This is not, however, alignment of fields or variables as there are high level concepts. – Margaret Bloom Feb 27 '17 at 07:30
-
It is directly related to the alignment of fields and variables: try making variables with different alignments and the section alignment changes. So the linker and loader need to be part of the alignment puzzle. Of course they may not know **why** a particular section is aligned to some value, but they are still willing to help out. – BeeOnRope Feb 27 '17 at 14:31
-
@BeeOnRope We agree on that. It's stated in the first lemma above. – Margaret Bloom Feb 27 '17 at 14:46
-
The examples in the lemma look wrong: `malloc` **definitely** doesn't align all allocations to 4K: that would imply that every allocation takes at least that much space! In practice small allocations are usually aligned to 8 or 16 bytes. As a consequence, `malloc` doesn't usually return memory suitable aligned for "over aligned" objects (e.g., where you used `#pragma align` or other alignment attributes to increase alignment). – BeeOnRope Feb 27 '17 at 15:28
-
... and then the section loading is also wrong. Sections are most definitely _not_ loaded to a page boundary. That would also be inefficient! Check out the sections in a very simply binary I listed below: there are fully 26 sections, 24 of them directly loadable. They would take 24 4K pages if they had to be at a page boundary. What actually occurs is that the pages for the exe are mapped as-is and so multiple sections may occur on the same page. Some pages may be mapped two or more times if the sections they contain imply different permissions. – BeeOnRope Feb 27 '17 at 15:32
-
... all that to say that the linker needs to be aware of alignment since it needs to place the sections at appropriate v and f offsets within the file, and there is an explicit ALGN field stored in the file which helps it with that (it isn't entirely clear to be if it is ever used by the loader though: the sections should already appear at a virtual offset that satisfies the alignment - perhaps it only checks ALGN as an integrity check? I dunno). – BeeOnRope Feb 27 '17 at 15:35
-
1@BeeOnRope I wasn't sure about the malloc (as I wrote), but effectively your simple reasoning is right! Regarding ELF, after reading again the ELF file format, I believe you are right again :) The sections must have virtual addr and physical offsets congruent modulo 4KiB, but not aligned on 4KiB. The linker must definitively be able to handle alignment (as a simple `readelf -s` shows) but it is still the compiler that instruments the linker on the proper alignment to use, so I like to see it as an agnostic feature (pretty much like the `and` instruction). Thanks for the feedback! – Margaret Bloom Feb 27 '17 at 17:06
-
Yes, definitely. You can say that the compiler is fully aware of all the alignment rules enforces and them everywhere, most of which is invisible to the linker, but because of global data being embedded in the exe some alignment behavior leaks through and the compiler must pass on some information to the linker, which doesn't know much about C, but is "alignment aware" in a very general way (i.e., through the ALGN field in the ELF format, etc). – BeeOnRope Feb 27 '17 at 18:11
-
I had originally claimed even the loader was alignment aware, but as far as I can tell in ELF, it just maps pages from the executable which already has its sections relatively aligned (the whole file offset congruent to the v offset modulo 4K thing) so it doesn't need to do anything special (although I'm not sure what happens if you specify an alignment like 2**13 which is larger than 4K). So maybe the ALGN field is never used at runtime, or maybe it is there to support loaders that do things differently. – BeeOnRope Feb 27 '17 at 18:13