3

In our lecture we were discussing the inner possible implementation of std::list. The lecturer showed the approach where a dummy node is created to indicate the end of the list:

struct Node { Node* prev; ... }

Node* dummy = reinterpret_cast<Node*>(new int8_t[sizeof(Node)]);
dummy->prev = ... /* last node */;

They claimed that the last line might cause undefined behaviour. I cannot see why this could be unsafe. At the end of the day, we are just overwriting several bits without dereferencing the preseting. If this really was problem, wouldn't it have occured at the point of reinterpret_casting?

So, does udefined behaviour actually take place here, and, if so, why?

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
Zhiltsoff Igor
  • 1,812
  • 8
  • 24
  • `int8_t` might be allocated at an improper alignment to what `Node` needs to be allocated at. For example, if you're on a 32-bit platform, then `Node` must be allocated at a memory address divisible by 4, but `int8` can be allocated at any address. –  Jan 27 '22 at 14:45
  • @eerorika: Guaranteed by whom? I doubt that the standard of the language guarantees this. Oh, it's a pointer, so maybe... No wait, that code is allocating an array of `int8` elements, which can theoretically start at any address. –  Jan 27 '22 at 14:47
  • I don't know if this is specifically UB, but the proper approach would be to use the plain operator new for the memory allocation and use placement new to initialize the object. – vandench Jan 27 '22 at 14:54
  • 3
    Not that this is related to your question, but would `new Node(dummy signature)` work better than trying to hack a byte buffer? ctor of `Node::Node(dummy signature)` would also set the state of the members to something that supports the `dummy` node (vs. uninitialized BYTE values). – franji1 Jan 27 '22 at 15:19
  • @463035818_is_not_a_number `either it is UB or not.` Not necessarily. There are implementation defined differences between systems, and some operations can be well defined on one system, and UB on another. Example: `int i = 32767; i++;` – eerorika Jan 27 '22 at 15:38
  • @eerorika oh right. I didnt consider that implemention defined can be either. I don't think thats the case here. Anyhow, comment deleted (and tag added) – 463035818_is_not_an_ai Jan 27 '22 at 15:41
  • @franji1 I suppose this could work, but the question is not about the correct implementation, but rather about the lecturer's comment about the lines above causing UB – Zhiltsoff Igor Jan 27 '22 at 15:52
  • @eerorika [mandatory nitpick about `aslignas` interaction](https://stackoverflow.com/a/15512182) . But I don't think it disqualifies what you're saying, for default `operator new` you're correct. The bigger nitpick should be that casting from `int8_t` creates undefined behavior because of lifetimes technically. Whereas using `void* operator new(size_t);` would not have that issue AFAIK. – Mgetz Jan 27 '22 at 16:21
  • 2
    @ZhiltsoffIgor ask yourself when does the lifetime of the object pointed to by `dummy` start? If the constructor of `Node` is never called then the lifetime never starts and thus accessing `dummy->prev` is technically undefined behavior. Technically that could be solved by just explicitly calling the constructor e.g. `dummy->Node();` – Mgetz Jan 27 '22 at 16:25
  • 2
    @Mgetz: That's not how you explicitly call the constructor. placement-new is. `auto raw_memory = new std::byte[sizeof (Node)]; auto dummy = new (raw_memory) Node;` Same pointer value as the reinterpret_cast, except you didn't need to write a messy cast, and you've given clear instructions to the compiler about object lifetime. – Ben Voigt Jan 27 '22 at 16:32
  • 1
    @Mgetz _ask yourself when does the lifetime of the object pointed to by dummy start?_ The object pointed to by `dummy` is the first element of `int8_t[sizeof(Node)]` array, its lifetime is started when/after evaluating `new int8_t[sizeof(Node)]`. – Language Lawyer Jan 27 '22 at 16:34
  • @BenVoigt it's... been awhile since I've needed or wanted to use placement new. – Mgetz Jan 27 '22 at 16:36
  • 2
    @LanguageLawyer: For a trivially constructible object, its lifetime starts when memory of proper size and alignment is allocated (so says `[basic.life]`) So it hinges on whether `Node` is trivially-constructible. Or one could just use placement-new, which is shorter and handles the change in pointer type seamlessly. – Ben Voigt Jan 27 '22 at 16:36
  • 1
    @LanguageLawyer [Please review](https://en.cppreference.com/w/cpp/language/lifetime) the lifetime of the `int8_t` started then, the lifetime of the `Node` did not start then. We can quible about trivial constructors if you want... but the point stands in general. – Mgetz Jan 27 '22 at 16:37
  • 1
    @BenVoigt _So it hinges on whether `Node` is trivially-constructible_ It doesn't matter, since there is no `Node` object. – Language Lawyer Jan 27 '22 at 16:41
  • A better way to allocate a raw byte array of proper size and alignment is to use [`std::aligned_storage`](https://en.cppreference.com/w/cpp/types/aligned_storage) – Remy Lebeau Jan 27 '22 at 18:56
  • @franji1 Presumably, `Node` carries a payload of some user-provided type. `new Node()` might not work if that user-provided type doesn't have a default constructor. More generally, `std::list` might not have enough information to manufacture a dummy instance of that user-defined type. And even if it does, creating a dummy instance may produce output or have side effects, thus altering the observable behavior of the program; and therefore is not a permissible implementation technique. – Igor Tandetnik Jan 28 '22 at 05:06
  • @eerorika I think https://eel.is/c++draft/expr.new#16 requires only an array new with type `char`, `unsigned char` or `std::byte` to return a pointer suitably aligned for all fundamental types (independently of the alignment of the result from the allocation function call) and `int8_t` is probably not allowed to be `char` (https://stackoverflow.com/questions/15911714/are-int8-t-and-uint8-t-intended-to-be-char-types). – user17732522 Jan 28 '22 at 20:39
  • @user17732522 Fair enough, I agree with that. – eerorika Jan 28 '22 at 21:51

1 Answers1

4

First, for your second line

Node* dummy = reinterpret_cast<Node*>(new int8_t[sizeof(Node)]);

by itself.

new returns a pointer to the first int8_t object in the array of int8_t objects it created.

reinterpret_cast's behavior depends on the alignment of the address represented by the pointer. If it is suitably aligned for an object of type Node, then it will leave the pointer value unchanged (since there is definitively no Node object at the location which is pointer-interconvertible with the int8_t object). If it is not suitably aligned, the returned pointer value will be unspecified.

Unspecified means that we won't know what the value will be, but it wont cause undefined behavior.

Therefore, in any case, the second line and the cast by itself do not have undefined behavior.


The line

dummy->prev = ... /* last node */;

requires that the object dummy points to is actually a Node object. Otherwise it has undefined behavior. As mentioned above, reinterpret_cast gives us either an unspecified value or a pointer to the int8_t object. This already is an issue, that I think at least requires a std::launder call.

Even if the pointer returned from new is correctly aligned, then we still need to check whether a Node object is present. We certainly did not create any such object in any of the shown operations explicitly, but there is implicit object creation which may help out (at least since C++20, but I suppose this was supposed to be a defect report against older standard versions).

Specifically, objects may be created implicitly inside an array of types unsigned char, std::byte and, with some limitations, char (CWG 2489) when the lifetime of the array is started. int8_t is usually signed char and I think is not allowed to be either of the three previously mentioned types (see e.g. this question). This removes the only possible way out of UB.

So your third code line does have undefined behavior.


Even if you remedy this by changing the type form int8_t to std::byte, there are other constraints on the details of Node to make the implicit object creation possible. It may also be necessary to add a std::launder call.

All of this doesn't consider the alignment yet, because although new[] obtains memory with some alignment requirements, I think the standard mandates new[] itself to return a pointer with stronger alignment than required for the element type only for char, unsigned char and std::byte array new.

Many of these issues can probably be avoided by using e.g. operator new directly, possibly with provided alignment request, and making sure that Node is an aggregate.

In any case writing code like this is very risky because it is difficult to be sure that it isn't UB. It should be avoided when ever possible.

user17732522
  • 53,019
  • 2
  • 56
  • 105
  • I don't see how or why `std::launder` would make a difference here. Laundering is generally used to ensure that the optimizer won't see through constants from previously-used storage -- for example, if a `T` was constructed in that byte buffer that contained a `const` member before trying to construct a `Node` over top of it. IIRC it can also be used to return a buffer back to its original type after aliasing as well. Though since this is a newly allocated byte buffer, I'm pretty sure this would be entirely unrelated for both cases unless using a custom `new` implementation – Human-Compiler Jan 28 '22 at 20:43
  • @Human-Compiler Practically that may be true, but going by the standard, the result from `new` points to the first element of the array, so to an `int8_t` object. `reinterpret_cast` is specified to not change that only if there is a _pointer-interconvertible_ `Node` object at the same address. But even if there was a `Node` object for which the array provides storage, it doesn't match the requirements for _pointer-interconvertible_ (https://en.cppreference.com/w/cpp/language/static_cast#pointer-interconvertible). So `std::launder` would be required to get at pointer to the `Node` object. – user17732522 Jan 28 '22 at 20:50
  • @Human-Compiler Although I may be misunderstanding something, since the paper introducing implicit object creation has a similar example with `new char[...]` without using `std::launder`. (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html) I might ask a new question about this. – user17732522 Jan 28 '22 at 20:55
  • I'm uncertain now; it'd definitely make a good question. You might, in-fact, be right. I know `std::launder` is needed to indicate a lifetime of an object that _has_ been safely constructed in a buffer, but whose pointer from placement-`new` was discarded. If the lifetime materialization from P0593 counts as a similar case, then `std::launder` may be needed; though I suspect the pointer returned in this instance is _implicitly_ the new lifetime pointer. – Human-Compiler Jan 28 '22 at 21:20
  • The paper is just sloppy in its non-usage of `std::launder`. – Language Lawyer Jan 28 '22 at 21:21
  • @Human-Compiler I think I wont post a question. The comment above makes me sure enough that I am not wrong on that and I think there are other questions already which are close enough, although I haven't found a proper duplicate. I suppose if you are interested you could write up a question instead. – user17732522 Jan 28 '22 at 21:31
  • @Human-Compiler And P0593 also introduced the concept of _returning a pointer to a suitable created object_ for operations that implicitly create objects. This property basically says that the operations will return a pointer to any one of the implicitly created objects as required to give the program defined behavior. However `new` expressions are not specified to do that. (But `operator new` is.) – user17732522 Jan 28 '22 at 21:34