Calling `std::vector ::data()` on `A` with const or reference fields, before C++20

Question

This is a followup on the answers for placement new on a class with reference field.

Calling std::vector<A>::data() on type A that has reference or const fields, returns a pointer to objects that may be changed through the original vector by placement new, which causes a const or reference field of an original object to be replaced, while still being managed by another pointer, returned via the call to data().

For example:

struct A {
    const int i = 0;
};

int main() {
    std::vector<A> vec = {{1}, {2}};
    auto ptr = vec.data();
    std::cout << ptr[1].i << std::endl; // 2
    vec.pop_back();
    vec.push_back({3}); // placement new, inside
    std::cout << ptr[1].i << std::endl; // 3
}

C++17 tried to resolve such issues by introducing std::launder but it was later agreed that while std::launder may solve other issues, it doesn't really solve the problem for above use case as noted in NB US042.

Some questions - for C++ versions prior to C++20:

Since NB US042 was accepted as a change for C++20 spec, but not marked as a DR - would it be advised, according to the spec only, to avoid the use of std::vector<A>::data() on a type A that has reference or const fields as in above example?
Or, the wording of the spec for std::vector<>::data() covers that, making it legal and leaving the implementability question to the library implementers?
If it is the latter, what can the library do to make it legal?
If it can't really do anything useful to make it legal, is it UB before C++20?
If it is UB before C++20, why wasn't this change considered to be a candidate for a DR, same as p0593r6? Most probably compilers do the right thing anyway, why not mandate that retroactively?

I deleted my answer because it does not answer the question. Still, the example given above does not seem to highlight the problem. If one uses `pop_back`, the next `push_back` is applied to a "fresh" (unoccupied) memory. `std::launder` was introduced in c++17 to fix some problems where placement new is used on a memory that is currently occupied by a different object. I wonder if a realistic example of the problem ever exists, with `.data()`. Let me reiterate: placement new on an unoccupied memory should work fine, why should it not?. — zkoza, Sep 03 '20 at 18:11
@zkoza "*If one uses `pop_back`, the next `push_back` is applied to a "fresh" (unoccupied) memory*" - that's incorrect. — Fureeish, Sep 04 '20 at 13:08
The position in memory of the vector data can be reallocated, thus after a pop_back and a later push_back, I do not expect the pointer to reference (always) valid data. I consider this UB. The modification for x20 is IMO just avoiding the active seeking and warning of such situations by the compiler (which could lead to complex/unnecessary implementation). — Adrian Maire, Sep 04 '20 at 13:31
@AdrianMaire while some `push_back`s will reallocate memory, if you `pop_back` once and then `push_back` once, that particular `push_back` will never allocate any memory for the vector. — Fureeish, Sep 04 '20 at 13:36
@Fureeish Can `push_back`just after `pull_back` destroy or modify in an uncontrollable manner a vaild object? — zkoza, Sep 04 '20 at 14:17
@zkoza there is no such thing as `pull_back` - I'll assume you meant `pop_back`. `pop_back` correctly destroyes the last element (calls appropriate destructos). The memory is still allocated. Then, if we call `push_back` a *placement `new`* call will be made to **construct** another object in the storage that was occupied by the object that we previoulsly destroyed. Not sure whether that answers your question. What do you mean by "*destroy or modify in an uncontrollable manner*"? — Fureeish, Sep 04 '20 at 14:19
Not sure why this is a problem. The first `p+1` and the second `p+1` are two different pointers. One only ever refers to the old object, and gets destroyed at the end of the expression. The other one only ever refers to the new object. `p` itself doesn't have any problem, the object it points to never changes. OTOH if you save `p+1` to a named variable, this is (or should be) illegal regardless of whether there are any const or reference subobjects, because an invalidated pointer should never be revalidated back. — n. m. could be an AI, Sep 07 '20 at 12:37
@n.'pronouns'm. Theoretically, before C++20, the compiler is allowed to cache the value retrieved from `ptr[1].i` so you may not see that the value was changed. — Amir Kirsh, Sep 07 '20 at 15:50
"the compiler is allowed to cache" I don't understand what this sentence means from the standard's point of view. There are no compilers in the standard. The *value* doesn't change but this is not what determines validity. Provenance matters. Two pointers may compare equal but one can be valid and the other not. — n. m. could be an AI, Sep 07 '20 at 16:17
@n.'pronouns'm. It relates to the object's lifetime rules that were changed a bit in C++20: https://eel.is/c++draft/basic.life#8 vs. https://timsong-cpp.github.io/cppwp/n4659/intro.object#2.3 - the restriction on accessing an object with the same name after it is "reborn" on the same storage location, when the object has "_a reference member or a const subobject_" was removed. Compiler is allowed to make optimizations based on undefined behavior (assuming something cannot happen as it is undefined - thus optimizing accordingly). — Amir Kirsh, Sep 07 '20 at 17:06

Ben Voigt · Answer 1 · 2022-04-15T19:28:16.877

Whether this code is valid depends on whether the implementation has strict pointer safety, a concept eliminated in C++20.

Simply, the pointer ptr is valid for arithmetic in a range from [ptr, ptr+2) immediately after the call to data().

After the call to pop_back(), any saved pointer to index 1 is certainly invalid, due to the iterator invalidation rules for std::vector<T>::pop_back()

Effect: Invalidates iterators and references at or after the point of the erase.

After the call to pop_back(), the pointer ptr obtained earlier is no longer valid for arithmetic in its original range (using it to compute ptr+1 no longer results in a "safely-derived" pointer value).

After the call to push_back(), strict safety of arithmetic using ptr is not restored. However, indexing using the original ptr, which has not been invalidated by pop_back() (only the reachable range was reduced), is still allowed on implementations with "relaxed pointer safety". That is, the expressions ptr+1 and ptr[1] involve valid-but-unsafely-derived pointer values.

A new call to data() returns a pointer value which compares equal to the original, but which can, unlike the old saved pointer, be used to safely derive values out to the current length of the vector. Again, implementations with relaxed pointer validity don't care.

The change made for C++20 for lifetime of objects having const members has no effect here, because the use of existing pointers to refer to the replacement object was not forbidden solely by [basic.life] but also by the iterator invalidation clause in pop_back.

Calling `std::vector ::data()` on `A` with const or reference fields, before C++20

1 Answers1

Linked

Calling `std::vector::data()` on `A` with const or reference fields, before C++20

1 Answers1

Linked

Calling `std::vector ::data()` on `A` with const or reference fields, before C++20