C++ placement new after memset

Question

Suppose there's a struct whose constructor does not initialize all member variables:

struct Foo {
  int x;
  Foo() {}
}

If I memset some buffer to 0, use placement new on that buffer to create an instance of Foo, and then read x from that instance, is that defined behavior?

void bar(void* buf) {
  memset(buf, 0, sizeof(Foo));
  Foo* foo = new(buf) Foo;
  std::cout << foo.x; // Is this undefined behavior?
}

[Here is nice CppCon](https://youtu.be/IAdLwUXRUvg) showing how deep this rabbit hole is. — Marek R, Aug 09 '21 at 15:24

score 14 · Accepted Answer · 2021-09-05T20:07:14.860

14

As a supplement to the other answer:

On the off chance that anyone feels like handwaving this away as "technically undefined behavior, but safe enough for me", allow me to demonstrate how thoroughly broken the resulting code can be.

If x is initialized:

struct Foo {
  int x = 0;
  Foo() {}
};

// slightly simpler bar()
int bar(void* buf) {
  std::memset(buf, 0, sizeof(Foo));
  Foo* foo = new(buf) Foo;
  return foo->x; 
}

g++-11 with -O3 produces the following:

bar(void*):
        mov     DWORD PTR [rdi], 0   <----- memset(buff, 0, 4) and/or int x = 0 
        xor     eax, eax             <----- Set the return value to 0
        ret

Which is just fine. In fact, it doesn't even exhibit whatever overhead one could hope to eliminate via in-place uninitialized construction. Compilers are smart.

In contrast to that, when leaving x uninitialized:

struct Foo {
  int x;
  Foo() {}
};
// ... same bar

We get, with the same compiler and settings:

bar(void*):
        mov     eax, DWORD PTR [rdi] <----- Just dereference buf as the result ?!?
        ret

Well, it's certainly faster, but what happened to the memset()?

The compiler figured that since we put an uninitialized int (aka junk) on top of the freshly memsetted memory, it doesn't even have to bother with the memset() in the first place. It can just "recycle" the junk that was there beforehand.

anything -> 0 -> anything collapses down to anything after all. So the function not altering the memory pointed at by buff is a reasonable interpretation of the code.

You can play around with these examples on godbolt here.

edited Sep 05 '21 at 20:07

answered Aug 09 '21 at 15:03

2

Footnote: I *think* the compiler would have been within its right to just leave `eax` as is in the second case. But I can see how returning the value stored in the object's storage might be consistent with the additional aliasing safety gcc adds for grandfathered-in unions. – Aug 09 '21 at 15:12
1

Another thing to play with: `Foo* foo = new (std::launder((Foo*)buf)) Foo;`. – Evg Aug 09 '21 at 16:51
@Evg Doesn't `std::launder` require an object to be already at that location? This "works" because the compiler has to assume this is the case (even though it's not), but I'm not sure this is formally well-defined, unless IOC kicks in somehow. – Aug 09 '21 at 16:57
It does. But why it changes the generated assembly I can't say. – Evg Aug 09 '21 at 19:35
@Evg The name for `std::launder` in the standard is **pointer optimization barrier**. It forces the compiler to assume that the new `Foo` and `buf` do not alias eachother, despite it being patently obvious. So the memset() applies to `buf`, and the `Foo` construction applies to the new `Foo` pointer, and the compiler is forced to treat them as separate addresses. – Aug 09 '21 at 19:52
And how does this explain setting `eax` to zero when `std::launder` is used? Looks quite the opposite. – Evg Aug 09 '21 at 20:16
@Evg I *think* this might just be what gcc does for unitialized ints without clearly defined provided storage: https://gcc.godbolt.org/z/Mo7exn4zq (but it could be a myriad other things of course) – Aug 09 '21 at 20:20
But that zero is a zero from `memset`. Try `memset` with `1`. – Evg Aug 09 '21 at 20:43

score 12 · Answer 2 · answered Aug 09 '21 at 14:15

12

It is textbook undefined behavior. Member x is not initialized after the constructor, and reading uninitialized variable is undefined behavior.

The fact that this memory was previously filled with something else is irrelevant.

answered Aug 09 '21 at 14:15

SergeyA

61,605
5
78
137

Reading non initialized variable is UB? This is not true at all. It will always work and never fail. The only thing that the variable's value is not defined (i.e. you can't assume that it will always be 0 or something else in such case). – Alexander Dyagilev Aug 09 '21 at 14:28
1

@alagner, yes, not initialized after object lifetime begins. The constructor is called, but member variable is not initialized. – SergeyA Aug 09 '21 at 14:33
3

@AlexanderDyagilev you are quite wrong. Reading uninitialized variable is textbook undefined behavior. – SergeyA Aug 09 '21 at 14:34
2

@AlexanderDyagilev What do you mean it will always work? UB doesn't mean fail to compile. UB means you can't know what the behavior is going to be. In this example placement new could initialize the buffer to some debug representation which means the value of `x` can be different across different implementations. Also, there is the standard which straight up states it is UB: https://timsong-cpp.github.io/cppwp/basic#indet-2 – NathanOliver Aug 09 '21 at 14:35
@SergeyA yeah right, my bad, I focused on the placement new part too much. – alagner Aug 09 '21 at 14:36
5

@AlexanderDyagilev - Well... up until your program crashes over one miserable bool... https://stackoverflow.com/q/54120862/817643 – StoryTeller - Unslander Monica Aug 09 '21 at 14:43
@PeteBecker I guess you’ve addressed the wrong person ;) – alagner Aug 09 '21 at 14:58
2

@AlexanderDyagilev -- "undefined behavior" does not mean "something bad will happen". It simply means that the C++ language definition doesn't tell you what a program that includes that behavior will do. Typically, such a program will "work" just fine. Until you're giving a demo to your most important customer, when it will crash. – Pete Becker Aug 09 '21 at 15:01

C++ placement new after memset

2 Answers2

Linked