-4

While playing around with Structure-Padding I found something weird...

At first glance it seems off, that the Structure's size isn't the size of its Members and that Structures are padded differently based on whether it's inside of an Array or isn't:

Code

typedef struct {
    char c;
    double d;
    int i;
} test_struct;

int main() {
    printf("Size of Struct: %d\n", sizeof(test_struct));
    test_struct t1, t2;
    printf("Offset between Structs: %d\n", (long long) &t1 - (long long) &t2);
    test_struct arr[2];
    printf("Offset between Structs in Array: %d\n",  (long long) &arr[1] - (long long) &arr[0]);
}

Output

(64bit-system)

Size of Struct: 24
Offset between Structs: 32
Offset between Structs in Array: 24
TK36
  • 332
  • 3
  • 11
  • 5
    The offset between `t1` and `t2` is completely meaningless. There could be any number of bytes between the two. And it's not even guaranteed that `t1` is at the higher address. – user3386109 Jan 30 '22 at 00:16
  • 3
    @user3386109 It's also [undefined behavior](https://port70.net/~nsz/c/c11/n1570.html#6.5.6p9): "When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object" – Andrew Henle Jan 30 '22 at 00:22
  • 1
    @AndrewHenle: There is no subtracting of pointers in the code. Conversion of pointers to integer types is implementation-defined, and the subtraction of integers is defined in the absence of overflow. – Eric Postpischil Jan 30 '22 at 00:42
  • @user3386109 cool? what does that change? exactly! nothing about the explanation changed and the effect won't be measured, because the output is already posted – TK36 Jan 30 '22 at 01:22
  • *"Structures are padded differently based on whether it's inside of an Array or isn't"* No, they are not. You're confused because you found that the spacing between the structures in the array is 24, and the spacing between `t1` and `t2` is 32. What you don't seem to understand is that the spacing between `t1` and `t2` is meaningless, and has nothing to do with structure padding. – user3386109 Jan 30 '22 at 01:31

2 Answers2

0

I describe the way a compiler typically lays out a structure here.

printf("Size of Struct: %d\n", sizeof(test_struct));

sizeof produces a result of type size_t. It should be printed with %zu, not %d. Once that is correct, this will print the number of bytes in the structure.

Note that you can use sizeof test_struct, because sizeof is not a function and does not require arguments to be passed in parentheses. It is an operator. If its operand is a type, that does need to be in parentheses, for reasons of C grammar.

test_struct t1, t2;
printf("Offset between Structs: %d\n", (long long) &t1 - (long long) &t2);

The compiler is free to place t1 and t2 where it wants in memory, subject to alignment rules and other considerations. They do not have to be adjacent to each other. long long values should be printed with %lld, not %d.

In C implementations with flat address spaces, conversion of a pointer to an integer will usually produce the expected address, and so subtracting in such an implementation will produce the offset between the addresses. However, this is not guaranteed by the C standard and is not true in all C implementations.

printf("Offset between Structs in Array: %d\n", sizeof(arr) / sizeof(*arr));

Dividing the size of an array by the size of an element produces the number of elements in the array, not the size of an element. And again, size_t values should be printed with %zu, not %d.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • While I agree with some points which I will correct, I don't quite see your point often...1. No sizeof doesn't return size_t, as sizeof is an operator, hance working without size_t defined. Though they will be the same, technically sizeof doesn't return size_t 2. Using "%d" doesn't break anything so I see no problem with that 3. While they don't HAVE to be put next to each other they will be and it's good for explaining – TK36 Jan 30 '22 at 00:59
  • @TK36: Re 1., “No sizeof doesn't return size_t”: C 2018 6.5.3.4 5 says, of `sizeof` and `_Alignof` “The value of the result of both operators is implementation-defined, and its type (an unsigned integer type) is `size_t`,…” Whether the name `size_t` is in scope or not is irrelevant to the fact there is some type that `sizeof` returns, and it is not `int` (it cannot be `int` because it is an unsigned type). – Eric Postpischil Jan 30 '22 at 02:02
  • @TK36: Re 2., “Using "%d" doesn't break anything so I see no problem with that”: It can break things when `int` and `size_t` are different sizes. C 2018 7.21.6.1 9 says “… If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.” – Eric Postpischil Jan 30 '22 at 02:03
  • @TK36: Re 3., “While they don't HAVE to be put next to each other they will be…”: Quite obviously they are not in the very case the OP asks about; their output shows they are separated by 8 bytes. (The output shows their addresses differ by 32 bytes, 8 more than the size of one of them.) – Eric Postpischil Jan 30 '22 at 02:05
  • I AM OP :)))))) – TK36 Jan 30 '22 at 02:10
  • @TK36: The output shows what the output shows. – Eric Postpischil Jan 30 '22 at 02:20
-2

The goal of Structure-Padding and Member-Alignment is to have all Members at "natural Address" in Memory.
Variable x is at a "natural Address" if &x % sizeof(x).

Processors read Memory in Words; 32bit-systems often read Memory in Words of 32bits/4bytes and similarly 64bit-systems often read Memory in Words of 64bits/8bytes.
To ensure that reading one Variable may be done reading a minimal amount of Words the compiler alignes them.

This boosts performance, as it cuts down on Word-accesses by the CPU. However it wastes some Memory as Padding.
Under extreme circumstances you might want to consider using the pack-pragma.

Quick Sidenote: the size of Pointers are 1 Word.


Size

sizeof(test_struct) returns 24 because Members within the Struct get aligned like this:

struct {
    char c; // 1 byte
    char pad1[7]; // so d is at byte 8 from the beginning (multiple of 8, d's size)
    double d; // 8 bytes
    int i; // 4 bytes
    char pad2[4]; // so consecutive Structs also have d at multiple of 8 globally
};

Offset

The above only generates a Structure with "naturally aligned" Members if the Struct itself is located at a multiple of 8. This applies generally: consecutive "correctly" padded Structs have all their Members at "natural Adresses" only if the first Structure is at a "natural Address" of it's biggest Member-Type.

C's biggest Primitive is long double (80bits/10bytes on 32bit & 128bit/16bytes on 64bit). From what we just learned we can conclude that placing Structures at Addresses which are multiples of long double's size guarantees that all Members of said Structure are correctly aligned. Hence C is putting Struct-Variables at addresses, which are multiples of 16. A Struct won't shrink in size, so the second Struct will be placed after the 24bytes of the first Struct + 8bytes of offset, totaling to 32bytes of offset between the two.

If you're wondering whether this is wasting additional Memory, this Padding couldn't be simply added to the end of the Struct aswell or if #pragma pack(1) also prevents this:

No, and this is because C will actually squeeze in other, primitive, small enough Variables in front of Structs and hence there's no real downside anyway.


Offset in Array

When examining pad2 you might realize, that the Comment only holds true under the assumption, that the Struct itself is at a Memory-Address which is a multiple of 8.
Again speaking generally: at a multiple of it's biggest Member-Type's size.

Arrays of Structs always contain Objects of the same Type. So by adding a Padding at the end to make the total size a multiple of the biggest Member's size we can be certain all other Structs following will be aligned just like the first of the sequence is.

The first will be put at an Address which is a multiple of 16 as discussed previously. Hence adding this Padding makes for a great Space & Time efficiency of our Structs, because we aren't forced into multiples of 16 but instead can put Structs right next to each other.

You might realize, that using Arrays of Structs which don't contain long doubles rather than multiple Variables of that Struct, can actually be a bit more Memory-efficient if you don't happen to have many small Variable to fit into the gaps between your hypothetical Struct-Variables. But for most to all use-cases this will probably be irrelevant regardless, yet grasping why it's true shows understanding of Structure-Padding.


I hope I could help you with elaboration on the topic.
If you're still confused, take a look at more explainations here or other great material on the topic here.

TK36
  • 332
  • 3
  • 11
  • 1
    You have `char[7] pad1;` — that isn't valid C. Presumably you meant `char pad1[7];`? – Jonathan Leffler Jan 30 '22 at 00:19
  • *Under extreme circumstances you might want to consider using the pack-pragma.* If you don't understand padding and addressing, you probably should never use `#pragma pack` in any form. And if you do understand padding and addressing, you probably never will use it because it all too easily leads to undefined behavior. – Andrew Henle Jan 30 '22 at 00:24
  • Re “Variable x is at a "natural Address" if `&x % sizeof(x)`”: Alignments are implementation-defined, and the C standard does not require that alignment be a multiple of the object size. (The converse is true.) For example, a `double` could be eight bytes but only require four-byte alignment. – Eric Postpischil Jan 30 '22 at 00:45
  • Re “C's biggest Primitive is long double (80bits/10bytes on 32bit & 128bit/16bytes on 64bit)”: The sizes of most objects are implementation-defined. `long long int` could be bigger than `long double`, for example. – Eric Postpischil Jan 30 '22 at 00:46
  • 1
    Re “Processors read Memory in Words. Those are sized as indicated by your system; 32bit-systems read Memory in Words of 32bits/4bytes and similarly 64bit-systems read Memory in Words of 64bits/8bytes.”: There is no single thing that makes a system 32-bit or 64-bit; those are colloquial terms, not technical definitions. Processors have several features that can have various widths, including the bus, general registers, instruction data width, addresses. These may be mixed; a processor may have instructions that operate on 64 bits of data, use 48-bit addresses, and have 32-bit registers. – Eric Postpischil Jan 30 '22 at 00:49
  • Re “a "natural Address" of it's biggest Member-Type”: A structure’s biggest member is not necessarily the one with the strictest alignment requirement. A 10-byte `long double` might require only four-byte alignment while an eight-byte `long int` could require eight-byte alignment. Or a million-byte `char` array could require only one-byte alignment while a four-byte `int` could require four-byte alignment. – – Eric Postpischil Jan 30 '22 at 01:03
  • @everyone: yeah maybe technically alignment isn't defined as a multiple or something like that. but realistically it will be, it's easier to explain it this way and pretty much all explanations you'll find of this use similar style – TK36 Jan 30 '22 at 01:19
  • @Eric Postpischil yepp, i changed the wording – TK36 Jan 30 '22 at 01:19
  • @Jonathan Leffler had to use Java a lot lately ;) – TK36 Jan 30 '22 at 01:20