11

A coworker wanted to write this:

std::string_view strip_whitespace(std::string_view sv);

std::string line = "hello  ";
line = strip_whitespace(line);

I said that returning string_view made me uneasy a priori, and furthermore, the aliasing here looked like UB to me.

I can say with certainty that line = strip_whitespace(line) in this case is equivalent to line = std::string_view(line.data(), 5). I believe that will call string::operator=(const T&) [with T=string_view], which is defined to be equivalent to line.assign(const T&) [with T=string_view], which is defined to be equivalent to line.assign(line.data(), 5), which is defined to do this:

Preconditions: [s, s + n) is a valid range.
Effects: Replaces the string controlled by *this with a copy of the range [s, s + n).
Returns: *this.

But this doesn't say what happens when there's aliasing.

I asked this question on the cpplang Slack yesterday and got mixed answers. Looking for super authoritative answers here, and/or empirical analysis of real library vendors' implementations.


I wrote test cases for string::assign, vector::assign, deque::assign, list::assign, and forward_list::assign.

  • Libc++ makes all of these test cases work.
  • Libstdc++ makes them all work except for forward_list, which segfaults.
  • I don't know about MSVC's library.

The segfault in libstdc++ gives me hope that this is UB; but I also see both libc++ and libstdc++ going to great effort to make this work at least in the common cases.

Quuxplusone
  • 23,928
  • 8
  • 94
  • 159
  • Did you compile the test cases with ASan and/or run them under Valgrind? That would take the guesswork out of whether the code causes access violations, though it might still work in practice rather than by definition. – Konrad Rudolph Feb 13 '20 at 16:47
  • 1
    "If any member function or operator of basic_­string throws an exception, that function or operator has no other effect on the basic_­string object." -- this forces allocation of storage to occur before existing storage gets freed, so that an exception gets thrown if allocation fails, without altering `*this`. But I see nothing to prevent the existing storage being reused, in which case this becomes unspecified, since the semantics of copy-overing the storage is unspecified. – Sam Varshavchik Feb 13 '20 at 16:55
  • [Related/duplicate for the `std::string::assign` case](https://stackoverflow.com/questions/28142011/can-you-assign-a-substring-of-a-stdstring-to-itself) and [for the container case (`std::vector::assign` specifically)](https://stackoverflow.com/questions/36338603/assign-part-of-a-vector-to-itself-using-stdvectorassign). – walnut Feb 13 '20 at 16:59
  • 2
    For the sequence containers mentioned, it is certainly UB, because of precondition violation of the `assign` requirements in [\[tab:container.seq.req\]](https://eel.is/c++draft/containers#tab:container.seq.req). – walnut Feb 13 '20 at 17:12

1 Answers1

8

Barring a couple of exceptions of which yours is not one, calling a non-const member function (i.e. assign) on a string invalidates [...] pointers [...] to its elements. This violates the precondition on assign that [s, s + n) is a valid range, so this is undefined behavior.

Note that string::operator=(string const&) has language specifically to make self-assignment a no-op.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • Hope you don't mind my edit: now I can upvote in clear conscience! – Bathsheba Feb 13 '20 at 17:19
  • 1
    So what exactly is the point of invalidation and the point at which the precondition is required to hold? The answer seems to be assuming that the precondition must hold after the member function has been called. – walnut Feb 13 '20 at 17:22
  • 1
    @walnut I am no language-lawyer (neither a person with a particularly extended C++ knowledge), but when we inverse your scenario, we can ask a question - could the range be invalidated *during* the execution of `assign`? If yes, then we would have to set a specific point *inside* the implementation of assign to mark when exactly the invalidation can occur, and I believe that's not something C++ would do. I could be wrong though. – Fureeish Feb 13 '20 at 17:25
  • 2
    @Fureeish I don't know either, but see e.g. [LWG issue 526](http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#526), closed as "*not a defect*", which mentions in its recommendation for closure that `std::vector::insert(iterator pos, const T& value)` must work if `value` is into the vector itself, because the standard doesn't specify that it is allowed not to work, even though that reference might be invalidated by the call. – walnut Feb 13 '20 at 17:30
  • 1
    @walnut "*is required to work because the standard doesn't give permission for it not to work.*" - ***love it***. Sooo... is it worth to ask what happens *in practice*? Is the implementation required to make a *copy* of the argument in such situation? How you could realistically implement it..? I've heard about standard requiring compilers to do the impossible - is it one of those cases? Regardless, thank you for the comment! – Fureeish Feb 13 '20 at 17:35
  • 1
    @Fureeish Actually my previous (now deleted) example wasn't actually testing what I wanted to test. [Here](https://godbolt.org/z/JxY8Na) is a fixed example showing that both libc++ and libstdc++ actually do copy before moving on reallocation as required. – walnut Feb 13 '20 at 18:22