2

Minimum example:

#include <fstream>

struct TFoo
{
    bool Field1_ = false;
    uint64_t Field2_ = 0;
};

int main() {
    TFoo Foo_{};
    const char* filename = "text.txt";
    std::ofstream f(filename);

    f.write((char*)(&Foo_), sizeof(Foo_));
}

clang since 5th version in msan reports something like this:

Uninitialized bytes in __interceptor_fwrite at offset 0 inside [0x720000000000, 15)
==71928==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x2823aa  (/home/<hidden>/test-ofstream+0x2823aa)
    #1 0x27830f  (/home/<hidden>/test-ofstream+0x27830f)
    #2 0x272757  (/home/<hidden>/test-ofstream+0x272757)
    #3 0x271388  (/home/<hidden>/test-ofstream+0x271388)
    #4 0x270c96  (/home/<hidden>/test-ofstream+0x270c96)
    #5 0x2709e2  (/home/<hidden>/test-ofstream+0x2709e2)
    #6 0x26ff9e  (/home/<hidden>/test-ofstream+0x26ff9e)
    #7 0x7fbc7238382f  (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

It is because padding bytes between Field1_ and Field2_ are not initialized.

It is ok, MSAN is right.

But if the code contains very large examples of such code (saving structs to binary files), is there any beautiful method to massively make the code better?

(Packed structs is not a solution for us.)

vladon
  • 8,158
  • 2
  • 47
  • 91
  • Just writing the raw bytes of structs is not portable and inefficient and may break any time you change a compiler flag and the compiler uses different padding. If you want to serialize data serialize it. You can do so manually by finding a byte representation for `bool` and `uint64_t` or by using a serialization library such as [cereal](https://uscilab.github.io/cereal/). – nwp Jan 26 '18 at 08:56
  • @nwp Question is not about it. Consider there is a millions of struct which written and reading such a way. Platform, compiler and so on are the same. – vladon Jan 26 '18 at 09:12
  • @vladon _"Millions of structs"_ or _"millions of places in code where such writes/reads happen"_? – Daniel Langr Jan 26 '18 at 09:16
  • @DanielLangr Yes, consider this. So advice to "rewrite writing of all of them to serializing" is bad. – vladon Jan 26 '18 at 09:30
  • @vladon All right. And are you able to change the definition of `struct TFoo`? For instance to add a constructor to it? If so, look here: https://stackoverflow.com/q/19545557/580083. – Daniel Langr Jan 26 '18 at 09:40
  • I don't know what you expect. "Yes we are doing it wrong. No we can't change that." That doesn't really leave many possibilities. Just stop using the sanitizers and pray it will hold until you change jobs. – nwp Jan 26 '18 at 10:43

1 Answers1

1

If you can change the definition of struct TFoo, you can add a constructor like this:

struct TFoo {
  bool Field1_;
  uint64_t Fields_;
  TFoo() { memset(this, 0, sizeof(*this)); }
  TFoo(const TFoo &rhs) : TFoo() { Field1_ = rhs.Field1_; Field2_ = rhs.Field2_; }
};

I think you cannot use memset this way according to the standard, but it may work with your compiler. See How can I zero just the padding bytes of a class?, where this solution is discussed.

ORIGINAL ANSWER

This came to my mind to zero out padded bytes between Field1_ and Field2_. But honestly, I am not sure if it is Standard compliant. Definitely, some kind of serialization of TFoo objects would be much better, but if I understood correctly, it's not an option for you, is it?

struct TFoo
{
  bool Field1_ = false;
  uint64_t Field2_ = 0;
}; 

struct TFooWrapper {
  union {
    TFoo tfoo;
    char dummy[sizeof(TFoo)] = { 0 };
  } u;
};

UPDATE

From http://en.cppreference.com/w/cpp/language/union: At most one variant member can have a default member initializer. The above code is therefore likely not correct. But you can, e.g., define default constructor for TFooWrapper to zero out all bytes.

Daniel Langr
  • 22,196
  • 3
  • 50
  • 93
  • casting a pointer to `char*` to peek and poke individual bytes of the representation is allowed with regards to strict aliasing. – ratchet freak Jan 26 '18 at 11:00