why does size of the struct need to be a multiple of the largest alignment of any struct member

Question

I understand the padding that takes place between the members of a struct to ensure correct alignment of individual types. However, why does the data structure have to be a multiple of alignment of largest member? I don't understand the padding is needed at the end.

Reference: http://en.wikipedia.org/wiki/Data_structure_alignment

thb · Accepted Answer · 2012-04-25T15:07:06.333

Good question. Consider this hypothetical type:

struct A {
    int n;
    bool flag;
};

So, an object of type A should take five bytes (four for the int plus one for the bool), but in fact it takes eight. Why?

The answer is seen if you use the type like this:

const size_t N = 100;
A a[N];

If each A were only five bytes, then a[0] would align but a[1], a[2] and most of the other elements would not.

But why does alignment even matter? There are several reasons, all hardware-related. One reason is that recently/frequently used memory is cached in cache lines on the CPU silicon for rapid access. An aligned object smaller than a cache line always fits in a single line (but see the interesting comments appended below), but an unaligned object may straddle two lines, wasting cache.

There are actually even more fundamental hardware reasons, having to do with the way byte-addressable data is transferred down a 32- or 64-bit data bus, quite apart from cache lines. Not only will misalignment clog the bus with extra fetches (due as before to straddling), but it will also force registers to shift bytes as they come in. Even worse, misalignment tends to confuse optimization logic (at least, Intel's optimization manual says that it does, though I have no personal knowledge of this last point). So, misalignment is very bad from a performance standpoint.

It usually is worth it to waste the padding bytes for these reasons.

Update: The comments below are all useful. I recommend them.

*An aligned object smaller than a cache line always fits in a single line, but an unaligned object may straddle two lines* > **No**. Whether aligned or not an object might straddle two lines. — Matthieu M., Apr 25 '12 at 07:10
@MatthieuM., actually yes and no. Cache-line sizes are multiples of the largest data size, and any other fundamental type. Therefore all (er, most) aligned _native_ types will naturally be within a single cache-line. Consider that any 1,2,4,8,16 byte aligned type will automatically be aligned to fit within a cache-line of 64 or 128 bytes. A system would essentially be unusable if this were not the case. — edA-qa mort-ora-y, Apr 25 '12 at 07:25
@edA-qamort-ora-y: sure, but an object being a compound of fundamental types might easily straddle two lines. Even if it is smaller. Supposing a 64 bytes lines, I can have an object 48 bytes large, and in a table of such objects, at least one out of two will straddle two cache lines. — Matthieu M., Apr 25 '12 at 08:17
@MatthieuM., yes, I was just trying to clarify for native types. Compound types will certainly cross lines, but their constituent native types will not. — edA-qa mort-ora-y, Apr 25 '12 at 09:59

score 1 · Answer 2 · answered Apr 25 '12 at 07:18

Depending on the hardware, alignment might be necessary or just help speeding up execution.

There is a certain number of processors (ARM I believe) in which an unaligned access leads to a hardware exception. Plain and simple.

Even though typical x86 processors are more lenient, there is still a penalty in accessing unaligned fundamental types, as the processor has to do more work to bring the bits into the register before being able to operate on it. Compilers usually offer specific attributes/pragmas when packing is desirable nonetheless.

score -1 · Answer 3 · answered Apr 25 '12 at 03:52

Because of virtual addressing.

"...aligning a page on a page-sized boundary lets the hardware map a virtual address to a physical address by substituting the higher bits in the address, rather than doing complex arithmetic."

By the way, I found the Wikipedia page on this quite well written.

score -1 · Answer 4 · answered Apr 25 '12 at 03:53

-1

If the register size of the CPU is 32 bits, then it can grab memory that is on 32 bit boundaries with a single assembly instruction. It is slower to grab 32 bits, and then get the byte that starts at bit 8.

BTW: There doesn't have to be padding. You can ask that structures be packed.

answered Apr 25 '12 at 03:53

Steve Wellens

20,506
2
28
69

1

Packing is compiler specific, not in the language. Packing will blow up on RISC machines, if the CPU/Kernel support for handling misaligned loads and stores is not there or not turned on. Alignment just for speeds; it's a hard requirement on some machines. – Kaz Apr 25 '12 at 04:36

why does size of the struct need to be a multiple of the largest alignment of any struct member

4 Answers4

Linked