1

I roughly know about Alignment and have read cppreference's Objects and alignment and Wikipedia's Data structure alignment . However I still have some doubts. I'm mainly interested in C++, but the question applies to C too as it uses mostly the same rules for alignment.

I know that padding is added to increase the efficiency of data access, because on some architectures accessing a value at an address multiple of its size is faster/better (alignment).

  • Is that the only reason why padding is used?

If so, consider the following structures:

struct A {
  int i;
  char c;
};
struct B {
  struct A a;
  char d;
};

On my architecture (x86_64), the compiler places 3 bytes of padding at the end of A so that sizeof(A)==8 and sizeof(A[2])==16, and other 3 bytes of padding at the end of B, so that sizeof(B)==12.

I understand that aligning A to 8 bytes makes storing it in an array more efficient. But it doesn't seem to be useful at all, when A is placed inside B.

If everything so far is correct, then I'm wondering:

  • Why padding is placed at the end of types, instead of limiting it to between elements of aggregated types (e.g. struct or array) and never at the end?

An example of what I mean: wouldn't it be better if the compiler decided that sizeof(A)==5, sizeof(B)==6, sizeof(A[2])==13 (3 bytes of padding between the elements, but not at the end)?

Helloer
  • 417
  • 3
  • 13
  • 2
    You need to maintain the alignment if you stack up multiple instances of the structure. For example in an array. – user4581301 Jan 05 '21 at 19:18
  • 3
    Hint: Arrays aren't allowed to contain padding between their elements. – Joseph Sible-Reinstate Monica Jan 05 '21 at 19:18
  • Guys, answer in the answer section please – Asteroids With Wings Jan 05 '21 at 19:20
  • 2
    Surprised I can't find a good duplicate for this. Here's some related reading though that contains the answer: [C++ Data Member Alignment and Array Packing](https://stackoverflow.com/questions/1676385/c-data-member-alignment-and-array-packing) – user4581301 Jan 05 '21 at 19:28
  • There could be some benefit to being able to pack structures into other structures like this. However, given a declaration `struct A *p;`, tell us what `sizeof *p` should produce. If it returns `sizeof(struct A)` using your size without padding, then `struct A *p = malloc(3 * sizeof *p);` will not allocate enough space for 3 `struct A` objects to be put into an array indexed by `p[0]`, `p[1]`, and `p[2]`, assuming arrays are laid out to keep their elements properly aligned. How do you suggest changing the C language to accommodate this? – Eric Postpischil Jan 05 '21 at 19:48
  • 1
    @EricPostpischil: the C language could have defined pointer arithmetic differently: `arr[i]` should have been the same as `*(arr+i)` and it could be computed like this: `*(T*)( (char*)arr + i*(sizeof(arr) + padding) )`. For function that require the size in bytes of an array (e.g. `malloc`, `memset` etc) you could compute the size with `sizeofarr(T, n)`. – Helloer Jan 05 '21 at 19:55
  • Of course that's not how C/C++ currently works, and changing it now would be impossible (it'd destroy backward compatibility). But I wonder why C hasn't been defined that way, and whether what I'm suggesting would have problems. – Helloer Jan 05 '21 at 19:55
  • 6
    The question mentions performance as the reason for padding, but that isn't the only reason. There are architectures where data access *must* be done with the proper alignment or it won't work at all. – Michael Burr Jan 05 '21 at 20:13

1 Answers1

1

Consider an architecture where int 4-byte alignment is required (or desired for performance). Now consider the following structure:

struct S {
    int i;
    char c;
}

There will probably won't be any padding between i and c. But now think what will happen if you define something like:

struct S array[10];

Since the arrays are not allowed to have any padding between the elements, this padding has to be added to the S structure - in the end of it (3 bytes after c), to maintan the proper alignment of each element of array.

Eugene Sh.
  • 17,802
  • 8
  • 40
  • 61
  • 3
    I guess that inserting padding between elements of an array would break pointer arithmetic: `&A[0] + 1` must equal `&A[A1]`... But wouldn't it be enough to redefine pointer arithmetic so that summing an integer to a pointer jumps by `paddedsizeof(T)` bytes instead of `sizeof(T)`? – Helloer Jan 05 '21 at 19:31
  • @Helloer Simpler indexing math, most likely `&Arr[1]` is `Arr +1`, not `Arr +1 + padding`. – user4581301 Jan 05 '21 at 19:32
  • 1
    @Helloer The formal answer would be because the Standard defines the arrays so :) One of the rationales would be for example, is that you can allocate a certain amount of memory for a specific object array using only the information about its size - and be certain it will fit there (like in `malloc(10 * sizeof S)`. – Eugene Sh. Jan 05 '21 at 19:33
  • 2
    @user4581301: the meaning of `&Arr[1]` and `Arr+1` was chosen by the language standard, which is also what chose how to deal with padding. Couldn't they choose that `&Arr[1] == Arr+1 == Arr + paddedsizeof(T)` ? – Helloer Jan 05 '21 at 19:34
  • Yes, using `malloc` or `memset` on arrays would indeed be more complicated. – Helloer Jan 05 '21 at 19:35
  • 1
    They could, but why tack on an extra case? One rule to fit everything usually results in simpler and faster code. – user4581301 Jan 05 '21 at 19:35
  • 1
    @user4581301: Clearly it doesn't in this case. It results in `sizeof(B)` being 12 bytes instead of 8, so 50% more cache misses. – Mooing Duck Jan 06 '21 at 01:11
  • @Helloer: You are talking about the C++ sizeof operator restriction so that it is a multiple of the alignof value, mentioned as "padding" in online language references. In my opinion it can be dropped while informing devs of any code breakage. size_t aligned_sizeof = ( sizeof(type) + ( sizeof(type) % alignof(type) == 0 ) ? ( 0 ) : ( sizeof(type) - ( sizeof(type) % alignof(type) ) ) ) – rplgn Feb 15 '22 at 06:23