4

Of whose task does data-structure alignment correspond to? Is it the compiler, the linker, the loader or the hardware itself, like in case of x86? Does the compiler do relative aligned addressing, such that when 'placed' correctly by the linker in the compiled executable, the data-structures are always aligned to respective native-size boundaries? What more tasks does the loader have to do hereafter?

user2338150
  • 479
  • 5
  • 14

3 Answers3

3

The answer is that both the compiler and linker0 need to understand and handle alignment requirements. The compiler is the smart one of the pair as only it understands the actual structure, stack and variable alignment rules - but it propagates some information about required alignment to the linker which also needs to respect it when generating the final executable.

The compiler takes care of a lot of runtime alignment handling and, conversely, also often relies on the fact that certain minimum alignments are met1. The existing answers here cover what the compiler does in some details.

What is missing is that the linker and loader framework also deal with alignment. Generally speaking each section has a minimum alignment attribute, and the linker writes that attribute and the loader respects it, ensuring that the section is loaded on a boundary at least as aligned as that attribute.

Different sections will have different requirements, and changes to the code can affect those directly. A simple example is global data, whether it is in the .bss, .rodata, .data or some other section. These sections will have an alignment at least as large as the largest alignment requirement for any object stored therein.

So if you have a read-only (const) global object with 64-byte alignment, your .rodata section will have a minimum alignment of 64-bytes, and the linker will ensure this requirement is met.

You can use objdump -h to see the actual alignment requirements of any object file in the Algn column. Here's a random example:

Sections:
Idx Name          Size      VMA               LMA               File off  Algn  Flags
  0 .interp       0000001c  0000000000400238  0000000000400238  00000238  2**0  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.ABI-tag 00000020  0000000000400254  0000000000400254  00000254  2**2  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .note.gnu.build-id 00000024  0000000000400274  0000000000400274  00000274  2**2  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .gnu.hash     00000030  0000000000400298  0000000000400298  00000298  2**3  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .dynsym       00000288  00000000004002c8  00000000004002c8  000002c8  2**3  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .dynstr       00000128  0000000000400550  0000000000400550  00000550  2**0  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .gnu.version  00000036  0000000000400678  0000000000400678  00000678  2**1  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .gnu.version_r 00000050  00000000004006b0  00000000004006b0  000006b0  2**3  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .rela.dyn     00000060  0000000000400700  0000000000400700  00000700  2**3  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .rela.plt     00000210  0000000000400760  0000000000400760  00000760  2**3  CONTENTS, ALLOC, LOAD, READONLY, DATA
 10 .init         0000001a  0000000000400970  0000000000400970  00000970  2**2  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .plt          00000170  0000000000400990  0000000000400990  00000990  2**4  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .plt.got      00000008  0000000000400b00  0000000000400b00  00000b00  2**3  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .text         000021e2  0000000000400b10  0000000000400b10  00000b10  2**4  CONTENTS, ALLOC, LOAD, READONLY, CODE
 14 .fini         00000009  0000000000402cf4  0000000000402cf4  00002cf4  2**2  CONTENTS, ALLOC, LOAD, READONLY, CODE
 15 .rodata       00000700  0000000000402d00  0000000000402d00  00002d00  2**5  CONTENTS, ALLOC, LOAD, READONLY, DATA
 16 .eh_frame_hdr 000000b4  0000000000403400  0000000000403400  00003400  2**2  CONTENTS, ALLOC, LOAD, READONLY, DATA
 17 .eh_frame     000003d4  00000000004034b8  00000000004034b8  000034b8  2**3  CONTENTS, ALLOC, LOAD, READONLY, DATA
 18 .init_array   00000008  0000000000603e10  0000000000603e10  00003e10  2**3  CONTENTS, ALLOC, LOAD, DATA
 19 .fini_array   00000008  0000000000603e18  0000000000603e18  00003e18  2**3  CONTENTS, ALLOC, LOAD, DATA
 20 .jcr          00000008  0000000000603e20  0000000000603e20  00003e20  2**3  CONTENTS, ALLOC, LOAD, DATA
 21 .dynamic      000001d0  0000000000603e28  0000000000603e28  00003e28  2**3  CONTENTS, ALLOC, LOAD, DATA
 22 .got          00000008  0000000000603ff8  0000000000603ff8  00003ff8  2**3  CONTENTS, ALLOC, LOAD, DATA
 23 .got.plt      000000c8  0000000000604000  0000000000604000  00004000  2**3  CONTENTS, ALLOC, LOAD, DATA
 24 .data         00000020  00000000006040d0  00000000006040d0  000040d0  2**4  CONTENTS, ALLOC, LOAD, DATA
 25 .bss          000001c8  0000000000604100  0000000000604100  000040f0  2**5  ALLOC
 26 .comment      00000034  0000000000000000  0000000000000000  000040f0  2**0  CONTENTS, READONLY

The alignment requirements here vary from 2**0 (no alignment needed) to 2**5 (align on a 32-byte boundary).

Beyond the candidates you mentioned, the runtime also needs to be alignment aware. This topic is somewhat complex, but basically you can be sure that malloc and related functions return memory suitable aligned for any fundamental type (which usually just means 8-byte aligned on 64-bit systems), although things get more complicated when you are talking about over-aligned types, or C++ alignas.


0 I had originally just grouped the (compile-time) linker and (runtime) loader together as they are really two sides of the same coin (and indeed much of the linking is actually runtime linking). After looking more carefully into the loading process, however, it seems that the loader may just load the segments (sections) at their existing file offsets, automatically respecting the alignment set up by the linker.

1 Less so on platforms like x86 where unaligned access is usually allowed, but on platforms with alignment restrictions are stricter, code may actually fail if incorrect alignment is encountered.

Community
  • 1
  • 1
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
2

The shortest answer that I believe to be correct: it's the compiler's job.

This is why there are various #pragmas and other compiler-level magic knobs you can twist to control alignment, when you must.

I don't think the C language specifies the existance of those other components (linker and loader).

unwind
  • 391,730
  • 64
  • 469
  • 606
  • 1
    Well, linking is hinted at in the C11 standard, even the word linker is mentioned in a footnote; and other mentions like "Translation units may be separately translated and then later linked to produce an executable program." - of course the standard doesn't mandate that a separate linker must exist. – Antti Haapala -- Слава Україні Feb 24 '17 at 09:52
  • It is understandable, that the compiler designers cannot say about the linker, much less the loader, but the actual mechanics, could I say depends on the implementation, which includes the linker, and more natively the loader? – user2338150 Feb 24 '17 at 09:56
  • 1
    They don't need to be mentioned to be alignment-aware: the standard says how things have to work, then the toolchain implements it in a way that makes sense. In most cases, this means that the entire toolchain, from the compiler to the linker through to the loader/runtime linkers all need to understand alignment. Indeed, the standard doesn't even require a linker or a compiler (you could have conformed interpreted C, for example), but in practice that's exactly how it is implemented on most platforms. – BeeOnRope Feb 27 '17 at 02:45
1

Data alignment is tightly coupled with code generation.
Consider all the burden of generating the prologue and epilogue of a function that has local variables aligned on some boundary[live example].

The two codes below are generated from the same function but with different alignment (32B the left one, 4B the right one)

foo(double):                         foo(double):
   push    ebp                          lea     ecx, [esp+4]
   mov     ebp, esp                     and     esp, -8
   sub     esp, 40                      push    DWORD PTR [ecx-4]
   mov     eax, DWORD PTR [ebp+8]       push    ebp
   mov     DWORD PTR [ebp-40], eax      mov     ebp, esp
   mov     eax, DWORD PTR [ebp+12]      push    ecx
   mov     DWORD PTR [ebp-36], eax      sub     esp, 20
   fld1                                 mov     eax, ecx
   fstp    QWORD PTR [ebp-8]            fld1
   fld     QWORD PTR [ebp-40]           fstp    QWORD PTR [ebp-16]
   fstp    QWORD PTR [ebp-16]           fld     QWORD PTR [eax]
   fld     QWORD PTR [ebp-8]            fstp    QWORD PTR [ebp-24]
   fmul    QWORD PTR [ebp-16]           fld     QWORD PTR [ebp-16]
   leave                                fmul    QWORD PTR [ebp-24]
   ret                                  add     esp, 20
                                        pop     ecx
                                        pop     ebp
                                        lea     esp, [ecx-4]
                                        ret

While this example refers to the alignment of the stack, its purpose is to show the complications that arose.
Structure alignment works the same.

In order to defer this responsibility to the linker, the compiler would have to generate ad-hoc code and a lot of meta-data so that the linker could patch the necessary instructions.
Accommodating the limited linker interface would lead to the generation of sub-optimal code.
Enriching the linker capabilities would shift the compiler-linker boundary to the left, effectively making the latter "sorta" a small compiler.

The loader has no means on a program data - it has to load any program regardless of how they access their data, trying to treat code and data as opaque as possible.
Particularly, the loader usually fills or rewrites the executable meta-data but not the code nor the data.
Making the code going through meta-data every time it reads a struct field would be a huge performance kill for no rationale at all.

The hardware has no concept of structures nor of the intentions of the program.
When instructed to read from X it will do its best to read from X as fast and correctly as possible but it will assign no meaning to that X.
The hardware does what it is told to do.
If it can't, the condition is signalled. The x86 architecture has very relaxed alignment requirement at a cost of potentially doubling (or worst) the latency of the operation.


The compiler takes the full responsibility for aligning the data.
The two lemmas that come handy when doing so are1:

  • If and object a is X-aligned with respect to a Y-aligned object b and X | Y (Y is a multiple of X) then a is X-aligned with respect to the same reference of b.

    For example, the sections in a PE/ELF file (and somewhat even mallocd buffers) can be loaded aligned at a specific boundary (8 bytes, 16 bytes, 4KiB and so on).
    If a section is loaded aligned at 4KiB then all the power-of-two alignments up to 212 are automatically respected, once in memory, even if they are taken with respect to the start of the section, no matter where the section is loaded.

  • In a buffer B of length 2X-1 there is at least one address A that is X-aligned and such that 2X-1 - (A-B) >= X (it has enough space to hold an object of size X).

    If you need to align an object at 8-bytes boundary and that object is 8-bytes in length (as usually is) then allocating a buffer of 16-1 = 15 bytes will guarantee that a suitable address is present for every possible start address of the buffer.

Thanks to these two lemmas and an established convention with the loader, the compiler can fulfil its duties without reaching out to other tools.


1 given without too much explanation.

Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • What can I say, it's the best explanation on the topic... I guess I would need more experience in debugging to understand whole of it... Thank you for such awesome explanation. – user2338150 Feb 24 '17 at 19:30
  • This isn't totally correct - the linker and loader (aka runtime linker) also has to deal in alignment: otherwise how could alignment for global data be respected? – BeeOnRope Feb 27 '17 at 02:35
  • @BeeOnRope The ELF/PE file format specify the alignment of the sections and the loader has to respect it. And the linker has similar duties when merging two object files. This is not, however, alignment of fields or variables as there are high level concepts. – Margaret Bloom Feb 27 '17 at 07:30
  • It is directly related to the alignment of fields and variables: try making variables with different alignments and the section alignment changes. So the linker and loader need to be part of the alignment puzzle. Of course they may not know **why** a particular section is aligned to some value, but they are still willing to help out. – BeeOnRope Feb 27 '17 at 14:31
  • @BeeOnRope We agree on that. It's stated in the first lemma above. – Margaret Bloom Feb 27 '17 at 14:46
  • The examples in the lemma look wrong: `malloc` **definitely** doesn't align all allocations to 4K: that would imply that every allocation takes at least that much space! In practice small allocations are usually aligned to 8 or 16 bytes. As a consequence, `malloc` doesn't usually return memory suitable aligned for "over aligned" objects (e.g., where you used `#pragma align` or other alignment attributes to increase alignment). – BeeOnRope Feb 27 '17 at 15:28
  • ... and then the section loading is also wrong. Sections are most definitely _not_ loaded to a page boundary. That would also be inefficient! Check out the sections in a very simply binary I listed below: there are fully 26 sections, 24 of them directly loadable. They would take 24 4K pages if they had to be at a page boundary. What actually occurs is that the pages for the exe are mapped as-is and so multiple sections may occur on the same page. Some pages may be mapped two or more times if the sections they contain imply different permissions. – BeeOnRope Feb 27 '17 at 15:32
  • ... all that to say that the linker needs to be aware of alignment since it needs to place the sections at appropriate v and f offsets within the file, and there is an explicit ALGN field stored in the file which helps it with that (it isn't entirely clear to be if it is ever used by the loader though: the sections should already appear at a virtual offset that satisfies the alignment - perhaps it only checks ALGN as an integrity check? I dunno). – BeeOnRope Feb 27 '17 at 15:35
  • 1
    @BeeOnRope I wasn't sure about the malloc (as I wrote), but effectively your simple reasoning is right! Regarding ELF, after reading again the ELF file format, I believe you are right again :) The sections must have virtual addr and physical offsets congruent modulo 4KiB, but not aligned on 4KiB. The linker must definitively be able to handle alignment (as a simple `readelf -s` shows) but it is still the compiler that instruments the linker on the proper alignment to use, so I like to see it as an agnostic feature (pretty much like the `and` instruction). Thanks for the feedback! – Margaret Bloom Feb 27 '17 at 17:06
  • Yes, definitely. You can say that the compiler is fully aware of all the alignment rules enforces and them everywhere, most of which is invisible to the linker, but because of global data being embedded in the exe some alignment behavior leaks through and the compiler must pass on some information to the linker, which doesn't know much about C, but is "alignment aware" in a very general way (i.e., through the ALGN field in the ELF format, etc). – BeeOnRope Feb 27 '17 at 18:11
  • I had originally claimed even the loader was alignment aware, but as far as I can tell in ELF, it just maps pages from the executable which already has its sections relatively aligned (the whole file offset congruent to the v offset modulo 4K thing) so it doesn't need to do anything special (although I'm not sure what happens if you specify an alignment like 2**13 which is larger than 4K). So maybe the ALGN field is never used at runtime, or maybe it is there to support loaders that do things differently. – BeeOnRope Feb 27 '17 at 18:13