27

If a C compiler pads a structure in order to align the fields to their native alignment, and that structure is then initialized, is the padding initialized to zero?

For example the following structure:

typedef struct foo_t_ {
    int  a;
    char b;
    int  c;
    char d;
} foo_t;

On many systems this (poorly designed) structure would have a sizeof(foo_t) of 16, with a total of 6 bytes of padding, 3 bytes after each of the chars.

If we initialize the structure like:

foo_t foo = { .a = 1, .b = '2' };

then the fields foo.a will be set to 1 and foo.b will be set to the character '2'. The unspecified fields (`foo.c' and 'foo.d') will automatically be set to 0. The question is, what happens to the 6 bytes of padding? Will that also automatically be set to 0? or is it undefined behavior?

The use case is that I will be calculating hashes of data structures:

foo_t foo = { .a = 1, .b = '2' };
foo_t bar = { .a = 1, .b = '2' };
uint32_t hash_foo = calc_hash(&foo, sizeof(foo));
uint32_t hash_bar = calc_hash(&bar, sizeof(bar));

and I want to be sure that hash_foo and hash_bar are the same. I could guarantee this by first using memset() to clear the structures, then initializing them, but it seems cleaner to use C initialization instead.

In practice, GCC on my system does clear the padding as well, but I don't know if that is guaranteed.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
N. Leavy
  • 1,004
  • 9
  • 13
  • 5
    Wouldn't referencing that memory be UB, in any case? – Don Reba Jun 05 '16 at 12:43
  • 2
    Why not make the hash a function of the actual struct members with no dependence on the padding? – John Coleman Jun 05 '16 at 12:50
  • In your example, are you referring to a case where the variable is being declared locally in a function, or globally? – dear_tzvi Jun 05 '16 at 12:50
  • 1
    The 1st question of this implicit two-questions-question (:-/) is a duplicate to this: http://stackoverflow.com/q/13056364/694576 – alk Jun 05 '16 at 12:53
  • They are defined locally. – N. Leavy Jun 05 '16 at 12:53
  • 1
    It makes no sense to include pad bytes in your hash value calculation. Just extract the actual defined fields and use those for your hash function. The difference in speed shouldn't be very significant. – Tom Karzes Jun 05 '16 at 12:54
  • If I make my hash function aware of the specific fields of the object, then my hash function is no longer generic - and typically it is also slower. But if necessary to get defined behaviour then that may be the way to go (or else memset to 0 initially) – N. Leavy Jun 05 '16 at 12:57
  • 1
    I see what you're trying to do. But there's also the case where someone allocates the structure locally (on the stack), and explicitly assigns the fields of the structure (outside of an initializer). In that case, I would expect the pad bytes to contain garbage values. – Tom Karzes Jun 05 '16 at 13:02
  • Another thing you could do is have your hash function take an `int` or `char` array for the hash arguments, and then let the caller manually pack the structure members into the array for use as an argument to the hash function. It's a little clunky, but it would solve your problem. – Tom Karzes Jun 05 '16 at 13:07
  • As @TomKarzes wrote: don't include the padding bytes. You can make your function generic by providing the type and offset of each field in the `struct` in an array. There are automatic tools to do this (resp. to create both (array and the `struct` definition) from a description file. – too honest for this site Jun 05 '16 at 13:25
  • Sourav, why did you correct his spelling to USA English? It's perfectly fine to spell initialise and behaviour anywhere else except for USA. – Mirakurun Jun 05 '16 at 13:42
  • @N.Leavy: Regardless of their initialised status and the space saving, under certain circumstances accessing padding bytes produces a bus error signal that will terminate your program. (SIGBUS) – Aesin Jun 05 '16 at 15:09

1 Answers1

22

In general, As per C11, for any uninitialized object chapter §6.2.6.1/6,

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.

But, if the partial initialization is done, in that case, for rest of the members, the intialization happens as if an object that has static or thread storage duration, then, quoting the same standard, chapter §6.7.9/21

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

and regarding the implicit initialization of objects with static storage duration, paragraph 10

If an object that has static or thread storage duration is not initialized explicitly, then:

  • if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

So, in your case, padding for the remaining objects are guaranteed to be 0, but not for the members which has received the initializers.

So, all over, you should not depend on an implicit initialization of 0, use memset().

That being said, in anyway it's not recommended (required) to depend on padding bytes, if any. Use the exact member variables and calculate the hash based on those values.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • 1
    `memset` only sets bytes. But not all types have a "all-bits zero" encoding for the value `0`. Namely floating point types and pointer (_null pointers_) - `memset` is quite useless in such cases. Also the second part (please add paragraph-numbers!) is about padding **bits** in the members themselves, not padding bytes between the members of the agregate (i.e. `struct`, arrays have no padding between objects). – too honest for this site Jun 05 '16 at 13:02
  • @Olaf sir, thanks for the comment, I've appended the answer with chapters. Regarding the all-zero encoding and padding between array members, I agree with you, but in this case, only the padding bits is applicable, isn't it? – Sourav Ghosh Jun 05 '16 at 13:12
  • I think OP asks about the padding bytes between members, not the bits in each member. But I'll leave a comment at the question, too. – too honest for this site Jun 05 '16 at 13:24
  • @Olaf I'm not very sure I did follow you there. What is the difference between `padding bytes between members` and `the bits in each member`? I'm a _bit_ confused. :) – Sourav Ghosh Jun 05 '16 at 13:32
  • 1
    See 6.2.6.2p1/2 for padding **bits**. They are included in the `sizeof(Type_Of_Member)`. The **bytes** are for alignment of each member. and are not. – too honest for this site Jun 05 '16 at 13:35
  • I would suggest using brace initialization, since the standard specifies the behavior and side effects for that, while for `memset` there is no guarantee that it will do an observable write to the padding bits. – saagarjha Sep 06 '20 at 19:16