Iterator invalidation by `std::string::begin()`/`std::string::end()`?

Question

#include <string>
#include <iostream>

int main() {
    std::string s = "abcdef";

    std::string s2 = s;

    auto begin = const_cast<std::string const &>(s2).begin();
    auto end = s2.end();

    std::cout << end - begin << '\n';
}

This code mixes the result of begin() const with the result of end(). Neither of these functions is permitted to invalidate any iterators. However I'm curious whether the requirement of end() to not invalidate the iterator variable begin actually means that the variable begin is usable with end.

Consider a C++98, copy-on-write implementation of std::string; the non-const begin() and end() functions cause a the internal buffer to be copied because the result of these functions can be used to modify the string. So begin above starts out valid for both s and s2, but the use of the non-const end() member causes it to no longer be valid for s2, the container that produced it.

The above code produces 'unexpected' results with a copy-on-write implementation, such as libstdc++. Instead of end - begin being the same as s2.size(), libstdc++ produces another number.

Does causing begin to no longer be valid iterator into s2, the container it was retrieved from, constitute 'invalidating' the iterator? If you look at the requirements on iterators, they all appear to hold for this iterator after .end() is called, so perhaps begin still qualifies as a valid iterator, and thus has not been invalidated?
Is the above code well defined in C++98? In C++11, which prohibits copy-on-write implementations?

From my own brief reading of the specs, it appears under-specified, so that there may not be any guarantee that the results of begin() and end() can ever be used together, even without mixing const and non-const versions.

The reason that C++11 made COW explicitly disallowed is precisely this problem: your code is compliant and should result in `6`, but obviously doesn't. The COW implementation is _not_ compliant. — Lightness Races in Orbit, Feb 26 '15 at 17:19
libc++ gets this right. [Live](http://coliru.stacked-crooked.com/a/e0d48d1f709b1eb8). — Baum mit Augen, Feb 26 '15 at 17:21
@BaummitAugen For some definition of "right". The code in the question isn't legal pre-C++11, and it won't work (or isn't guaranteed) with pre-C++11 libraries (which includes the standard library delivered with g++). The library isn't wrong if it fails; the code is. — James Kanze, Feb 26 '15 at 17:56
@JamesKanze "Right" as defined by the standard I compiled against of course. My comment was not meant to be an answer, but a comment. — Baum mit Augen, Feb 27 '15 at 05:02

score 6 · Accepted Answer · answered Feb 26 '15 at 17:19

As you say, C++11 differs from earlier versions in this regard. There's no problem in C++11 because all attempts to allow copy on write were removed. In pre-C++11, your code results in undefined behavior; the call s2.end() is allowed to invalidate existing iterators (and did, and maybe still does, in g++).

Note that even if s2 were not a copy, the standard would allow it to invalidate iterators. In fact, the CD for C++98 even made things like f( s.begin(), s.end() ) or s[i] == s[j] undefined behavior. This was only realized at the last minute, and corrected so that only the first call to begin(), end() or [] could invalidate the iterators.

"References, pointers, and iterators referring to the elements of a basic_string sequence may be invalidated by the following uses of that basic_string object: Calling non-const member functions, except operator[](), at(), begin(), rbegin(), end(), and rend()." From C++03. — Lightness Races in Orbit, Feb 26 '15 at 17:38

score 2 · Answer 2 · answered Feb 26 '15 at 17:12

2

The code is OK: a CoW implementation is pretty much required to unshare when there is a danger to an iterator or reference to an element is held. That is, when you there is something which accessed an element in one string and a copy of it ventures to do the same, i.e., use an iterator or the subscript operator, it will have to be unshared. It could know about its iterators and update them as needed.

Of course, in a concurrent system it is near impossible to do all this without data races but pre-C++11 there are no data races.

answered Feb 26 '15 at 17:12

Dietmar Kühl

150,225
13
225
380

The code isn't correct, since unsharing in the call to `s2.end()` will invalidate the iterator returned by the previous call to `s1.begin()` (through a const reference). (Also, of course: you forgot a word or two in the last sentence. Use a mutex correctly, and it's simple to avoid any data races, What you doubtlessly mean is "is near impossible to do this without data races _and_ with acceptable performance".) – James Kanze Feb 26 '15 at 17:23
@JamesKanze: if unsharing upon use of `s2.end()` would invalidate things the unsharing would need to happen upon the call to `s2.begin()`. – Dietmar Kühl Feb 26 '15 at 17:59
Not as I understand it. His call to `begin()` is through a `const` lvalue, so he's calling `begin() const`. Unsharing will invalidate iterators, and an implementation is not allowed to invalidate iterators when `begin() const` is called. – James Kanze Feb 26 '15 at 18:12
@JamesKanze: the rules when iterators can be invalidated didn't really change. A CoW imolementation always had to track if an iterator or a reference to an element was taken. Upon taken the first iterator or element reference on a second string it needs to unshare: at this point there is no other iterator or reference which can be invalidated. – Dietmar Kühl Feb 26 '15 at 18:27
Upon taking the first non-const iterator or element reference, the implementation needs to unshare. This is what the last point in §21.3./5 is getting at. – James Kanze Feb 27 '15 at 23:13

score 2 · Answer 3 · edited May 23 '17 at 12:21

2

As of N3337 (which is essentially identical to C++11), the specification reads ([string.require]/4):

References, pointers, and iterators referring to the elements of a basic_string sequence may be invalidated by the following uses of that basic_string object:
[...]
- Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.

At least as I'd read it, this means that a call to begin or end is not allowed to invalidate any iterators. Although not stated directly, I'd also take this as meaning that no call to a const member function can invalidate any iterators.

This wording remains the same at least up through n4296.

edited May 23 '17 at 12:21

Community

1
1

answered Feb 26 '15 at 17:12

Jerry Coffin

476,176
80
629
1,111

n4296 postdates C++14, so this doesn't answer the question about C++98 and C++11. However, the conclusion is the same in those standards due to the same (or similar) wording. – Lightness Races in Orbit Feb 26 '15 at 17:15
The same text exists in C++98 and C++11, however if you look at the requirements on iterators, they all appear to hold on my variable `begin` after the call to `end()`. As such, it seems like `begin` has not technically been invalidated at all, even though it's unsuable with the result of `end()`. It's just that there don't appear to be any requirements on `begin()` or `end()` that require their results to be usable together. – bames53 Feb 26 '15 at 17:20
The requirements in C++ pre C++11 are significantly different. Through the CD2 of C++98, any call to the non-const `[]`, `at()`, `begin()` or `end()` could invalidate iterators, references and pointers. Between the CD2 and C++98, the committee tried to fix this, by saying that only the first call could invalidate iterators. In C++11, they changed it to say that no call could, which effectively banned copy on write. In the presented code, the program acquires an iterator through a call to `begin() const`, then calls `end()`, which pre-C++11 could invalidate the first iter. – James Kanze Feb 26 '15 at 17:29

score 1 · Answer 4 · edited Jun 20 '20 at 09:12

C++98 [lib.basic.string]/5 states:

References, pointers, and iterators referring to the elements of a basic_string sequence may be invalidated by the following uses of the basic_string object:

As an argument to non-member functions swap(), operator>>(), and getline().

As an argument to basic_string::swap().

Calling data() and c_str() member functions.

Calling non-const member functions, except operator[](), at(), begin(), rbegin(), end(), and rend().

Subsequent to any of the above uses except the forms of insert() and erase() which return iterators, the first call to non-const member functions operator[](), at(), begin(), rbegin(), end(), or rend().

Since the constructor of s2 is a "non-const member function", it is conforming for the call to non-const s2.end() - the first such call per the last bullet above - to invalidate iterators. The program therefore does not have defined behavior per C++98.

I won't comment on C++11 as I think the other answers explain clearly that the program has defined behavior in that context.

Iterator invalidation by `std::string::begin()`/`std::string::end()`?

4 Answers4