My colleague likes to initialize struct using memset, how I can persuade him not to do so?

Question

My colleague likes to write code to initialize fields of struct using memset like this:

struct Fun {
    int mem;
    int cpu;
    std::map<int, int> cpumap;
    std::string str;

    Fun() {
       int size = (char*)&(this->cpumap) - (char*)this;
       memset(this, 0, size);
    }
};

He says this code is 100% correct and can do the job well. So is this the typical way of initializing a struct in C++? I mean the behavior of this kind of code is defined well?

In this particular case the code is correct however it is much harder to read and more prone to errors than a simple solution like `int mem{}; int cpu{};` . Perhaps you could approach it from those angles (readability and robustness in the face of change). What if the int was after the map for example? Or what if there were multiple constructors, is he going to copy-paste this code into all of them. — M.M, Mar 27 '16 at 05:25

score 2 · Answer 1 · edited May 23 '17 at 12:07

memset, malloc, calloc and so on are the C-way of doing things - they're not C++-idomatic and are only really supported in C++ so you can use C code directly. Note that even in C you can use the struct someStruct = {0} syntax to zero-initialize a struct, so using memset with structs is unnecessary even in C. memset is really meant for zeroing buffers, not objects.

As for his assertions about correctness, I'll say that he is factually incorrect.

Here's a laundry-list of my observations:

It requires the programmer to manually put the first and last members in the expression to calculate size (and why not just use the sizeof operator?).
- I note that example you gave is unclear in its intention: while it clears the three scalar members, is it also meant to clear the mem and cpumap members? (What if another programmer added those two fields and forgot to update the constructor?)
It fails in the case of inheritance: the value of this would point to the start of the topmost parent, not the first non-inherited field, so in addition to blindly overwriting parent data, you would be doing this multiple times if parent constructors have the same "initialization" logic.
The size calculation happens at run-time, not compile-time, which wastes CPU cycles.
He's using int size instead of size_t size so it might not work on systems where sizeof(void*) != sizeof(int) (e.g. x64, some obscure ISAs, certain embedded architectures, etc)
He's blindly casting to (char*) even when that might be inappropriate. While sizeof(char) is guaranteed to be 1 I don't believe it is guaranteed for char* to always be an appropriate proxy for void*.
- Also this is a C-style cast. In C++ the cast operators static_cast, reinterpret_cast, and dynamic_cast are always preferred over (T)-style casts.
It makes the assumption that all members exist in the range defined by their declaration order. You cannot make this assumption in C++ (see here: Do class/struct members always get created in memory in the order they were declared? ) because the 1998 and 2003 specifications state:

The order of allocation of nonstatic data members separated by an access-specifier is unspecified

So his code would depend upon undefined-behaviour in this case:
```
struct Foo {
private:
    int a;
    int b;
public:
    int c;
private:
    int d;
}
Foo::Foo() {
    int size = (char*)&this.d - (char*)&this.a;
}
```
Dangerously, you cannot make assumptions that a zeroed member is "valid" - the implementation of std::map and std::string might have internal members which cannot be zero, by blindly wiping them you put them into an unknown state. This is dangerous.

Point is: do not do this.

The C++ way is using initialization lists, which offer a lot of compile-time safety and guarantees, and require an explicit initial value which is guaranteed to be type-safe. The syntax is:

struct Foo {
    someType x;
    int y;
    foo bar;
};

Foo:Foo() :
    x(0),
    y(0),
    bar(some_initial_bar_value) {
    // any sequential init logic goes here
}

(First bullet point) You seem to have missed the point. The code intends to initialize mem and cpu only, and does it in a fully legal, if ugly, way. — n. m. could be an AI, Mar 27 '16 at 04:49
Actually this is done intentionally, so memset will not zeroing std::map and std::string. ``I note that example you gave is incorrect in itself: it only calculates the distance between this.mem and this.cpumap - it ignores this.str.'' — bugs king, Mar 27 '16 at 04:53
The point starting "He's using " is not a valid objection; even in the case you mention, the difference will fit in an `int` . (I suppose theoretically the compiler could introduce more than `INT_MAX` bytes of padding but in practice that would not happen) — M.M, Mar 27 '16 at 05:28
"He's blindly..." is also not a valid objection: any object pointer can be cast to `char *` , and the subtraction does behave as expected. Not sure why you mention `void *`, because subtraction of `void *` is not defined. — M.M, Mar 27 '16 at 05:29
Regarding the order, OP's code does *not* have an access specifier separating the members being subtracted — M.M, Mar 27 '16 at 05:30

score 2 · Answer 2 · answered Mar 27 '16 at 05:25

The idiomatic C++ way is

Foo(): mem(0), cpu(0) {}

If your colleague likes to maintain the code alone, and keep it all in his head, more power to him. For us mere mortals his code is hard to maintain, because

It is not idiomatic C++. It is not clear what it does at a first glance.
It is uglier and often (as in this case) longer than the idiomatic code; if the idiomatic code is longer, your class is probably too big!
Changing the order of the members will break it.
Adding a virtual function will break it.
Adding an access specifier in the middle will break it.

Any single bullet point above would be quite enough to ditch this style for good. If all of them don't convince your colleague, I don"t know what will.

score 0 · Answer 3 · edited May 23 '17 at 12:31

I think both of the top answers here address the issue:

memset() or value initialization to zero out a struct?

You don't actually know how std::map or std::string is implemented, and setting all the bits inside them to 0 may very well screw up their internals. Also, though unlikely the case now, there is no guarantee made by C++ that any floating point type or pointer type will evaluate to positive 0 when all of its bits are 0. C++ only guarantees that for integral types. There's a brief discussion in the comments there. Again, not a likely problem, but you are technically relying on implementation details there.

Edit: 1201ProgramAlarm comments that he does seem to be aware of the first problem, and doesn't use memset on the map or the string. And since, in your example, it only applies to integers, I think he's used it "safely." Still, he's doing pointer math just to get out of writing:

  Fun() : mem(0), cpu(0) { }

right?

I had similar thoughts initially then realized that `cpumap` and `str` are not included in the `memset`. But it is still a bad idea. — 1201ProgramAlarm, Mar 27 '16 at 04:26
The posted snippet does not attempt to initalize either std::map or std::string member. — n. m. could be an AI, Mar 27 '16 at 04:34

score 0 · Answer 4 · edited May 23 '17 at 12:15

Firstly, you should never calculate the size of a struct based on the address of it's members. Because the compiler can do optimizations in the packing of objects into memory, the size of the struct can vary.

The only true way to get the size of a type is via sizeof, which is filled in by the compiler at compile time. This grantees the size you get is the actual size the compiler used. An example of varying struct packing sizes can be seen here.

As for initializing structures, there is a language defined way of doing so: structure initialization, struct StructType a = {0};.

This improves code readability, as it is the accepted way to set a struct.

Additional, it is a clearer signal to the compiler what your goal is then using a memset, making it easier for the compiler to create optimizations for your code. A compiler may not be smart enough to realize it can simply set the allocated memory to zero and remove the memset call. However the definition of the initialization is to do so.

My colleague likes to initialize struct using memset, how I can persuade him not to do so?

4 Answers4