0

I have a program that writes a constant struct to a binary file. I need it to be an exact match of another binary file that was created in a different manner. However, each time I run my executable, the resulting binary file is different. I need the produced file to be the same every time.

The code to reproduce the problem:

main.c:

#include <stdio.h>

typedef struct {
  double vector1[3];
  double vector2[3];
  unsigned int a_uint32_field;
  unsigned char a_uint8_field;
} Struct_type;

void CreateStruct(Struct_type* Struct_instance) {
  Struct_instance->vector1[0] = 0.0;
  Struct_instance->vector2[0] = 0.0;
  Struct_instance->vector1[1] = 0.0;
  Struct_instance->vector2[1] = 0.0;
  Struct_instance->vector1[2] = 0.0;
  Struct_instance->vector2[2] = 0.0;
  Struct_instance->a_uint32_field = 0U;
  Struct_instance->a_uint8_field = 0U;
}

int main() {
  Struct_type Struct_instance;
  FILE* file_pointer;

  CreateStruct(&Struct_instance);

  file_pointer = fopen("Saved_Struct.bin", "wb");
  fwrite((void*)&Struct_instance, sizeof(Struct_instance), 1, file_pointer);
  fclose(file_pointer);
  
  return (0);
}

Compile with:

gcc -o executable main.c -m32 -O0

Then run:

./executable

The first time I've run it, the file ended with hexadecimal \AE\CB\FF, the second time it was \F4\9C\FF. Deleting the older file or letting it be erased by fopen seems to make no difference. It was supposed to end with all zeros, that is: 00\00\00

Why does this happen? How can I prevent it?

Mefitico
  • 816
  • 1
  • 12
  • 41
  • 3
    Padding will hurt you - try zeroing the struct before reading and writing it. Something like `memset(&Struct_instance, 0, sizeof Struct_instance);` – Ted Lyngmo Sep 03 '21 at 17:28
  • 2
    In other words, try adding `memset(Struct_instance, 0, sizeof(*Struct_instance))` at the beginning of `CreateStruct `. – Steve Summit Sep 03 '21 at 17:31
  • 1
    I wonder if `Struct_type Struct_instance = {0};` will do the trick? Probably not guaranteed. – Eugene Sh. Sep 03 '21 at 17:32
  • @EugeneSh. Fair question, it wouldn't do the trick to solve my actual problem, this is just a minimum working example to reproduce the problem. The `CreateStruct` function is a stand-in for something much more complex. – Mefitico Sep 03 '21 at 17:34
  • I think a safer option is to serialize the data. That is, write each field of the struct to the file individually. That should make any padding irrelevant, and then the only worry about comparability would be if different systems had different sized primitive types. – yano Sep 03 '21 at 17:36
  • Mefitico: Complex or not, @SteveSummit's suggestion cost almost nothing. Just copy/paste it into the function where the struct is originally created. – Ted Lyngmo Sep 03 '21 at 17:37
  • @TedLyngmo : I'm testing the ideas here, but just to be clear, I'm not sure if Eugene idea was to replace the whole function with the {0} assignment (wouldn't work) or just initialize the struct with {0} then call the function (which I think solves the problem). – Mefitico Sep 03 '21 at 17:40
  • @Mefitico You can try `= {0}` first. I'm not 100% sure it'll zero out the padding bytes. `memset` _will_ however do that. Then, after zeroing out the struct, populate the values. So, adding the `memset` (or perhaps `= {0}`) first in the function creating the struct would be a way forward. – Ted Lyngmo Sep 03 '21 at 17:42
  • _However, each time I run my executable, the resulting binary file is different._ Is this with the same executable each time you run? If so, I don't think that can be explained away with padding, sounds like something else is going on. Padding shouldn't change between different runs of the same executable. If you're changing compiler flags, using a different system, then sure. – yano Sep 03 '21 at 17:42
  • @yano yes it is with the same executable, but I was thinking maybe the lack of padding was causing garbage to be written or some unused memory not being overwritten. – Mefitico Sep 03 '21 at 17:43
  • @yano The struct is automatic variable, so I would not be surprised the padding bytes to be indeterminate. Not sure if it aligns with http://port70.net/~nsz/c/c11/n1570.html#5.1.2.3p6 though – Eugene Sh. Sep 03 '21 at 17:43
  • You also need to think about what @yano wrote about serialization. If you compile this program on a different platform, your types may be stored differently and have different sizes. Always use fixed width types when writing binary files - and think about endianess. – Ted Lyngmo Sep 03 '21 at 17:44
  • @EugeneSh. Wouldn't the executable contain an assembly instruction to advance the stack pointer x number of bytes to reserve space for automatic variables? How could that change from one invocation to the next? – yano Sep 03 '21 at 17:47
  • 2
    @yano Garbage data on the stack where the struct is located. If the padding bytes are not initialized to anything, they will retain this garbage. – Eugene Sh. Sep 03 '21 at 17:49
  • @EugeneSh. ahh, you're talking about the _content_ of the padding. I was only considering about the _amount_ of padding, whoops. Gotcha... yeah nevermind, that absolutely could explain different output files from run to run. – yano Sep 03 '21 at 17:51
  • 3
    The `= {0}` initializer will initialize the padding to all zero bits. See 6.7.9/21 which says that if there are fewer initializers than structure members, the remainder of the structure shall be initialized like an object that has static storage duration. And 6.7.9/10 says *"if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;"* – user3386109 Sep 03 '21 at 18:29
  • Print the size of the struct and compare it with your expectation. You might like to read on `#pragma pack`, too. However, this is not portable; but writing binary is already not very portable due to different possible encodings. – the busybee Sep 03 '21 at 19:42
  • Now that @user3386109 has settled the `= {0}` part, `Struct_type Struct_instance = {0};` _should_ make a big difference - or rather, make the result predictable (as long as you stay on the same platform).. – Ted Lyngmo Sep 03 '21 at 19:53
  • @user3386109 While padding can be set to zero by tricks like a short initializer, memset, calloc, etc. there is no guarantee that they will keep that value. – Support Ukraine Sep 03 '21 at 21:13

1 Answers1

2

The problem is padding bytes. You can't control their value. For your specific struct, it's likely that you have 3 padding bytes at the end of the struct. They can have any values.

This is described in 6.2.6.1 (draft N1570):

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values

So even if you can start out with zero bits in the padding by using a short initializer or by using calloc, memset etc., it only holds to the first assignment.

For further reading: Is it guaranteed that the padding bits of "zeroed" structure will be zeroed in C?

The only way to be sure to get the same binary file every time is to get rid of the padding before writing the file. That can be done by using a packed struct. See for instance What is a "packed" structure in C?

As an alternative to packed struct you can write your struct so it has no padding. Like:

typedef struct {
  double vector1[3];
  double vector2[3];
  unsigned int a_uint32_field;
  unsigned char a_uint8_field;
  unsigned char mypadding[3];  // 3 is just a guess
} Struct_type;

and then have a (static) assert to check that sizeof the struct actually equals the sum of the sizeof the individual members.

Yet another alternative: Don't write the whole struct to the file using a single fwrite. Write the individual members one by one using multiple fwrite. In this way the padding won't go to the file.

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63