22

Consider this code:

#include <cstring>

template<typename T>
struct DefaultMsgImpl
{
    DefaultMsgImpl() { memset(this, 0, sizeof(T)); }
};

struct Msg : DefaultMsgImpl<Msg>
{
    int num;
};

int f()
{
    Msg msg{.num = 66};
    return msg.num;
}

With GCC 13 (and older), f() returns 0, but with all other compilers (MSVC, Clang, ICC, ICX, ...), f() returns 66.

Is the above code valid C++ (in which case it seems like GCC is miscompiling it), or is it undefined behavior (e.g. because of the memset touching the memory where Msg will live before T's lifetime begins)? If it is UB, I'd appreciate an answer citing the C++ standard.

Demo: https://godbolt.org/z/1qf9dv8cG

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • 5
    Well, as you should know, using `memset(this, ...)` is almost *never* a good idea... :D Interesting all the same though. – Some programmer dude Jul 04 '23 at 11:08
  • 2
    Your code assumes an equality that is not true. `this` in DefaultMsgImpl is not equal to `this` in Msg. https://godbolt.org/z/hcEMMs64e. And memset imo has not much use anymore in C++, either use {} initialization or std::fill – Pepijn Kramer Jul 04 '23 at 11:15
  • 3
    @PepijnKramer You should assign the pointer-member *after* the memset, not before. – Deduplicator Jul 04 '23 at 11:21
  • 1
    MSVC agrees with clang: https://godbolt.org/z/W61sY3E9r Here is a similar problem: https://stackoverflow.com/a/68713640/1387438 - but there is clear undefined behavior. IMHO here return value should be 66, since memset here is should be ignored (as described in other answer). `num` is considered uninitialized until constructor of `Msg` initializes it. Now question is is this UB or not? – Marek R Jul 04 '23 at 11:27
  • Reading the comment by @PepijnKramer I think it *is* UB. When you do `memset` the `this` pointer is pointing to a `DefaultMsgImpl` object, and its size is *not* equal to `sizeof(Msg)`. So your `memset` will go out of bounds. – Some programmer dude Jul 04 '23 at 11:37
  • Sonar complains: https://godbolt.org/z/oze1cjdbb – Marek R Jul 04 '23 at 11:38
  • @MarekR: Thanks for the Sonar link, that's interesting. It would help if someone can point out a GCC option to make it warn about this...so far I don't know of any. – John Zwinck Jul 04 '23 at 11:39
  • @Deduplicator Could you please elaborate a bit on why it's different. We do not change the pointer itself as far as I understand. – Dmitry Jul 04 '23 at 11:44
  • @JohnZwinck Sonar complains, but in theory it can be false positive, since code has usual pattern and limitation of the tool may have been reached. I would treat this as guide toward finding explanation of the problem. I'm betting it is UB, but I do not have good formal explanation of it. – Marek R Jul 04 '23 at 11:48
  • @Deduplicator Woops – Pepijn Kramer Jul 04 '23 at 12:04
  • 2
    As I understand it, `memset()` only has defined behaviour if used to manipulate instances of trivial types (a corollary would be undefined behaviour for non-trivial types). `DefaultMsgImpl` is a non-trivial type since it has a user-defined constructor (which calls `memset(this, ...)`), and `Msg` is non-trivial since it has a non-trivial base. From memory, `g++` has an option `-Wclass-memaccess` to warn about a number of potentially unsafe usage of "raw memory functions" (`memcpy()`, `memset(),` etc) which *might* be relevant in this case. – Peter Jul 04 '23 at 12:29
  • 1
    @Peter: The requirement for memset() is that the type is trivially copyable: https://en.cppreference.com/w/cpp/string/byte/memset and in this case, it is: https://godbolt.org/z/sx7rfxGco . Also, `-Wclass-memaccess` does not warn about this (but it would if the base class had a virtual function for example). – John Zwinck Jul 04 '23 at 19:47
  • ...I do see that memset() says it's UB if the object is a "potentially-overlapping subobject" and that is defined as "A subobject is potentially overlapping if it is a base class subobject". Maybe this is the root cause? I'm not 100% convinced yet though, given (1) GCC is the only compiler which produces the unexpected result and (2) UBSan and ASan do not complain about it. – John Zwinck Jul 04 '23 at 19:53
  • I think touching a field of a subclass in a constructor/destructor of its' superclass is UB. It has nothing to do with `memset`, you get the same behaviour with `((Msg *) this)->num = 0;` – yyyy Jul 05 '23 at 14:14
  • @yyyy: If you're right I'd like to see a citation from the standard about it (from you or someone else). The problem I have with your intuitive answer is that it would not be UB to `malloc(sizeof(Msg))` then memset then construct Msg via placement new. Clearly the memset is not operating on a "field of a subclass", because that subclass's lifetime has not yet begun, but I would expect that the storage for the subclass does exist and that the designated initializer takes effect afterward (as it does in all compilers other than GCC). – John Zwinck Jul 05 '23 at 15:46
  • @JohnZwinck I think very old MSVC would produce similar result, albeit MSVC didn't support the C-style initialization and in GCC it was an extension for long time. Honestly, it shouldn't have been legal because Msg is not a trivially-constructed class, so something fishy is here... 'Fishy" is ok in case of UB. – Swift - Friday Pie Jul 05 '23 at 18:53
  • Note, extended designated initalizers `{.num = 66};` were C99 and not allowed by ISO-C++ until C++20. So this is by definition ill-formed code. – Swift - Friday Pie Jul 05 '23 at 19:12
  • 1
    @JohnZwinck Just a side note: GCC's behavior is not consistent, the call to constructor may happens before or after the designated initializer. https://godbolt.org/z/q3T9P5vvY – yyyy Jul 06 '23 at 08:04
  • @Swift-FridayPie: Why do you say it's ill-formed code when you yourself say it is allowed in C++20? I am compiling the code in C++20 mode, so I don't understand your point. – John Zwinck Jul 08 '23 at 06:42
  • @JohnZwinck c++20 GCC support is expertimental and not always correct\complete\compliant. Formally this feature started to be supported from GCC8, but your variant (an aggregate inizalization of a non-trivial class) always produces warnings warning: missing initializer for member 'Msg::. De-facto GCC7 and some earlier support designated intializers for trivial classes (formally, it's an aggregate initialization) as an extension, with `-std=GNU++17` – Swift - Friday Pie Jul 08 '23 at 08:33

1 Answers1

3

It's an UB what happens in this case, regardles of method you set Msg. Accessing pure virtual methods overridden by derived class or accessing derived class's non-static members from constructor of base class is an UB.

Formally type T a.k.a. Msg is not constructed yet. So while doing something like this from a method after Msgs initialization is (relatively) fine and is the core of CRTP pattern, doing this from constructor is not.

Note, extended initalizers {.num = 66}; were a C99 feature but had preliminary historical support in some C++ compilers (or while in C++ mode in case of bilanguial compilers), which makes this code formally ill-formed until ISO C++20. In -pedantic mode GCC would warn about this. How exactly the extension should work when mixed with constructors, never was defined. De-facto, GCC first performs initialization, and then runs constructor's body.

Designated initialization is aggregate initialization. In ISO C++20 it's allowed only if there is no inherited or user-defined constructor present, which isn't the case here. E.g. on GCC's version of GNU-C++17 that appears not a requirement.

By ISO standard the elements of an non-union aggregate can be either default-initialized or initialized explicitly. Otherwise code is ill-formed, no diagnostics required. (9.4.5 Aggregate Initializers n 4868). You circumvent compiler's possible but unrequred complaints by using memset.

If there is a certain egg-or-chicken problem, the pattern can be extended by a base class which would hold all non-static members and definitions required by CRTP base:

template <typename T>
struct MsgTrait {};

template<typename T>
struct DefaultMsgImpl : MsgTrait<T>
{
    DefaultMsgImpl(int num) { this->num = num; }
};

// Msg
struct Msg;
template <>
struct MsgTrait<Msg> {
    int num;
    //something else?
};    

struct Msg : DefaultMsgImpl<Msg>
{
    Msg(int MsgNumber) : DefaultMsgImpl(MsgNumber) {}
};

int f()
{
    // Msg msg{.num = 66}; cannot do that. And Msg cannot be trivially constructed
    Msg msg{66};
    return msg.num;
}

MsgTrait<Msg> would be constructed and initialized first, and is considered a complete type, with all its nested types if any present, as long as its specialization is defined before attemption to instantiate DefaultMsgImpl<Msg>

Swift - Friday Pie
  • 12,777
  • 2
  • 19
  • 42
  • 2
    Small correction: Using virtuals, as long as they aren't pure virtuals, of the type currently under construction is fine. One just has to refrain from explicitly accessing not yet constructed sibling or containing objects. Not that it changes anything for the scenario under discussion. – Deduplicator Jul 12 '23 at 09:24
  • @Deduplicator pure in current class, yes, because overriding mechanism doesn't work. "Calling", if we use Python term, a member of derived class is on the same shelf - neither vtable nor storage are initualized yet. – Swift - Friday Pie Jul 12 '23 at 19:23