3

We know that there is padding in some structures in C. Please consider the following 2:

struct node1 {
      int a;
      int b;
      char c;
};

struct node2 {
      int a;
      char c;
      int b;
};

Assuming sizeof(int) = alignof(int) = 4 bytes:
sizeof(node1) = sizeof(node2) = 12, due to padding.

What is the performance difference between the two? (if any, w.r.t. the compiler or the architecture of the system, especially with GCC)

Lundin
  • 195,001
  • 40
  • 254
  • 396
Naman
  • 43
  • 6
  • 2
    Highly recommended by whom? – Adrian Mole Sep 10 '20 at 08:08
  • I was asked this in an interview which would be better to use, but I didn't know. So the interviewer said that it is always recommended to use node1, as those which are padded should be at the end of the structure, but didn't give the reason. So I have asked it up here. – Naman Sep 10 '20 at 08:15
  • 1
    Having the padding at the end of the struct instead of between two members is generally not significant, unless it's useful to be able to copy or initialize both int members with one 8-byte copy or store. In general putting larger members first is standard advice, but here it doesn't actually make a difference. [How do I organize members in a struct to waste the least space on alignment?](https://stackoverflow.com/q/56761591) - as I said in my answer, don't apply those rules without thinking. Good job thinking here. – Peter Cordes Sep 10 '20 at 08:16
  • Thanks for the kind suggestions. Have edited the question, on whether there is a performance difference or not. – Naman Sep 10 '20 at 08:20
  • This also may be important if there's multiple unaligned variables inside your struct: `struct node1{int a;char b;char c};` would be 8 bytes while `struct node2{char a;int b;char c};` would be 12 – Denis Sheremet Sep 10 '20 at 08:28
  • @DenisSheremet `multiple unaligned variables` there are no unaligned members in those structs. – 0___________ Sep 10 '20 at 08:30
  • @DenisSheremet: I hope the OP is only asking about cases where the total amount of padding is already minimized. That's what makes this question interesting and not a duplicate of [How do I organize members in a struct to waste the least space on alignment?](https://stackoverflow.com/q/56761591) – Peter Cordes Sep 10 '20 at 08:34
  • 1
    IMO it has marginal (if any) performance difference. I would call it premature optimization attempt. There is no one always valid rule how to order struct members. – 0___________ Sep 10 '20 at 08:36
  • There have been in the past, inside the [GCC](http://gcc.gnu.org) compiler, an optimization pass which reordered fields inside some `struct`-s – Basile Starynkevitch Sep 10 '20 at 09:13
  • This question makes me wonder if there have ever been any compilers with extreme memory optimizers that would give `node2` a size of 9, and if possible use the remaining 3 bytes for `char` or `smallint` variables that happen to be within the same data segment or stack frame. – Ruud Helderman Sep 10 '20 at 09:53
  • 1
    @RuudHelderman - That would throw off member alignment if you tried to allocate an array of these structs. – Vilx- Sep 10 '20 at 13:38
  • @Vilx- You are right. Even if static arrays would be correctly aligned and padded by the compiler, code would still break on the idiomatic `malloc(1000 * sizeof(node1))`. Well, there goes another potential argument in favor of `node1` out of the window. – Ruud Helderman Sep 10 '20 at 13:55

4 Answers4

4

These are bad examples - in this case it doesn't matter, since the amount of padding will be the same in either case. There will not be any performance differences.

The compiler will always strive to fill up trailing padding at the end of a struct or otherwise using arrays of structs wouldn't be feasible, since the first member should always be aligned. If not for trailing padding in some item struct_array[0], then the first member in struct_array[1] would end up misaligned.


The order would matter if we were to do this though:

struct node3 {
      int  a;
      char b;
      int  c;
      char d;
};

Assuming 4 byte int and 4 byte alignment, then b occupies 1+3 bytes here, and d an additional 1+3 bytes. This could have been written better if the two char members were placed adjacently, in which case the total amount of padding would just have been 2 bytes.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • They're bad examples for [How do I organize members in a struct to waste the least space on alignment?](https://stackoverflow.com/q/56761591), but that doesn't seem to be what the OP is asking. They're asking about the special case where the total struct size *doesn't* depend on the order of the members. It's plausible that having the padding at the end could in practice lead to worse code-gen with some compilers, but that would probably just be a missed optimization, not an inherent / cpu-architecture limitation. (e.g. GCC often tries to avoid writing padding) – Peter Cordes Sep 10 '20 at 08:44
  • @PeterCordes I can't think of any existing compiler or CPU where the two examples would result in different code though. – Lundin Sep 10 '20 at 08:52
  • GCC8 does do store merging, like if you did `s1.a = s1.b = 0`, gcc for x86-64 would use a single qword store. But for node2 it would need two separate dword stores; those members wouldn't be contiguous. I'm not saying your answer is wrong, I'm just not entirely happy with the way your answer is mostly about other cases where the struct sizes *aren't* the same. I thought the OP already knew that and was talking about the special case where the amount of padding is the same but the location is different. Worth mentioning yes, and agreed there's not much to say about the actual question. – Peter Cordes Sep 10 '20 at 09:09
  • @PeterCordes Except, if the program just as well needs to use one int + char pair at the same time and have those loaded in the same instruction. Though I suppose it may or may not need to mask out padding bytes, depending on the situation. Using `uint_fast8_t` instead of `char` might have eliminated the problem with indeterminate value padding bytes, as the compiler would then be free to allocate the variable value across 4 bytes. – Lundin Sep 10 '20 at 09:16
  • Yup, sure, if GCC is willing to overwrite the padding with a store, that pairing works, too/instead. So it depends on how you use the struct which grouping might optimize better. So there's no way to say which is better without more info, but they are different. – Peter Cordes Sep 10 '20 at 09:28
1

I would not be surprised if the interviewer's opinion was based on the old argument of backward compatibility when extending the struct in the future. Additional fields (char, smallint) may benefit from the space occupied by the trailing padding, without the risk of affecting the memory offset of the existing fields.

In most cases, it's a moot point. The approach itself is likely to break compatibility, for two reasons:

  1. Starting the extensions on a new alignment boundary (as would happen to node2) may not be memory-optimal, but it might well prevent the new fields from accidentally being overwritten by the padding of a 'legacy' struct.
  2. When compatibility is that much of an issue (e.g. when persisting or transferring data), then it makes more sense to serialize/deserialize (even if binary is a requirement) than to depend on a binary format that varies per architecture, per compiler, even per compiler option.
Ruud Helderman
  • 10,563
  • 1
  • 26
  • 45
1

OK, I might be completely off the mark here since this is a bit out of my league. If so, please correct me. But this is how I see it:

First of all, why do we need padding and alignment at all? It's just wasted bytes, isn't it? Well, turns out that processors like it. That is, if you issue an instruction to the CPU that operates on a 32-bit integer, the CPU will demand that this integer resides at a memory address which is dividable by 4. For a 64-bit integer it will need to reside in an address dividable by 8. And so on. This is done to make the CPU design simpler and better performant.

If you violate this requirement (aka "unaligned memory access"), most CPUs will raise an exception. x86 is actually an oddity because it will still perform the operation - but it will take more than twice as long because it will fetch the value from memory in two passes rather than one and then do bitwise magic to stick the value together from these separate accesses.

So this is the reason why compilers add padding to structs - so that all the members would be properly aligned and the CPU could access them quickly (or at all). Well, that's assuming the struct itself is located at a proper memory address. But it will also take care of that as long as you stick to standard operations for allocating the memory.

But it is possible to explicitly tell the compiler that you want a different alignment too. For example, if you want to use your struct to read in a bunch of data from a tightly packed file, you could explicitly set the padding to 1. In that case the compiler will also have to emit extra instructions to compensate for potential misalignment.

TL;DR - wrong alignment makes everything slower (or under certain conditions can crash your program entirely).

However this doesn't answer the question "where to better put the padding?" Padding is needed, yes, but where? Well, it doesn't make much difference directly, however by rearranging your members carefully you can reduce the size of the entire struct. And less memory used usually means a faster program. Especially if you create large arrays of these structs, using less memory will mean less memory accesses and more efficient use of CPU cache.

In your example however I don't think there's any difference.

P.S. Why does your struct end with a padding? Because arrays. The compiler wants to make sure that if you allocate an array of these structs, they will all be properly aligned. Because array members don't have any padding between them.

Vilx-
  • 104,512
  • 87
  • 279
  • 422
0

What is the performance difference between the two?

The performance difference is "indeterminable". For most cases it won't make any difference.

For cases where it does make a difference; either version might be faster, depending on how the structure is used. For one example, if you have a large array of these structures and frequently select a structure in the array "randomly"; then if you only access a and b of the randomly selected structure the first version can be faster (because a and b are more likely to be in the same cache line), and if you only access a and c then the second version can be faster.

Brendan
  • 35,656
  • 2
  • 39
  • 66