The key point is the last point in the C++03 standard. The
wording could be a lot clearer, but the intent is that the first
call to []
, at
, etc. (but only the first call) after
something which established new iterators (and thus invalidated
old ones) could invalidate iterators, but only the first. The
wording in C++03 was, in fact, a quick hack, inserted in
response to comments by the French national body on the CD2 of
C++98. The original problem is simple: consider:
std::string a( "some text" );
std::string b( a );
char& rc = a[2];
At this point, modifications through rc
must affect a
, but
not b
. If COW is being used, however, when a[2]
is called,
a
and b
share a representation; in order for writes through
the returned reference not to affect b
, a[2]
must be
considered a "write", and be allowed to invalidate the
reference. Which is what CD2 said: any call to a non-const
[]
, at
, or one of the begin
or end
functions could
invalidate iterators and references. The French national body
comments pointed out that this rendered a[i] == a[j]
invalid,
since the reference returned by one of the []
would be
invalidated by the other. The last point you cite of C++03 was
added to circumvent this—only the first call to []
et
al. could invalidate the iterators.
I don't think anyone was totally happy with the results. The
wording was done quickly, and while the intent was clear to
those who were aware of the history, and the original problem,
I don't think it was fully clear from standard. In addition,
some experts began to question the value of COW to begin with,
given the relative impossibility of the string class itself to
reliably detect all writes. (If a[i] == a[j]
is the complete
expression, there is no write. But the string class itself must
assume that the return value of a[i]
may result in a write.)
And in a multi-threaded environment, the cost of managing the
reference count needed for copy on write was deemed a relatively
high cost for something you usually don't need. The result is
that most implementations (which supported threading long before
C++11) have been moving away from COW anyway; as far as I know,
the only major implementation still using COW was g++ (but there
was a known bug in their multithreaded implementation) and
(maybe) Sun CC (which the last time I looked at it, was
inordinately slow, because of the cost of managing the counter).
I think the committee simply took what seemed to them the
simplest way of cleaning things up, by forbidding COW.
EDIT:
Some more clarification with regards to why a COW implementation
has to invalidate iterators on the first call to []
. Consider
a naïve implementation of COW. (I will just call it String, and
ignore all of the issues involving traits and allocators, which
aren't really relevant here. I'll also ignore exception and
thread safety, just to make things as simple as possible.)
class String
{
struct StringRep
{
int useCount;
size_t size;
char* data;
StringRep( char const* text, size_t size )
: useCount( 1 )
, size( size )
, data( ::operator new( size + 1 ) )
{
std::memcpy( data, text, size ):
data[size] = '\0';
}
~StringRep()
{
::operator delete( data );
}
};
StringRep* myRep;
public:
String( char const* initial_text )
: myRep( new StringRep( initial_text, strlen( initial_text ) ) )
{
}
String( String const& other )
: myRep( other.myRep )
{
++ myRep->useCount;
}
~String()
{
-- myRep->useCount;
if ( myRep->useCount == 0 ) {
delete myRep;
}
}
char& operator[]( size_t index )
{
return myRep->data[index];
}
};
Now imagine what happens if I write:
String a( "some text" );
String b( a );
a[4] = '-';
What is the value of b
after this? (Run through the code by
hand, if you're not sure.)
Obviously, this doesn't work. The solution is to add a flag,
bool uncopyable;
to StringRep
, which is initialized to
false
, and to modify the following functions:
String::String( String const& other )
{
if ( other.myRep->uncopyable ) {
myRep = new StringRep( other.myRep->data, other.myRep->size );
} else {
myRep = other.myRep;
++ myRep->useCount;
}
}
char& String::operator[]( size_t index )
{
if ( myRep->useCount > 1 ) {
-- myRep->useCount;
myRep = new StringRep( myRep->data, myRep->size );
}
myRep->uncopyable = true;
return myRep->data[index];
}
This means, of course, that []
will invalidate iterators and
references, but only the first time it is called on an object.
The next time, the useCount
will be one (and the image will be
uncopyable). So a[i] == a[j]
works; regardless of which the
compiler actually evaluates first (a[i]
or a[j]
), the second
one will find a useCount
of 1, and will not have to duplicate.
And because of the uncopyable
flag,
String a( "some text" );
char& c = a[4];
String b( a );
c = '-';
will also work, and not modify b
.
Of course, the above is enormously simplified. Getting it to
work in a multithreaded environment is extremely difficult,
unless you simply grab a mutex for the entire function for any
function which might modify anything (in which case, the
resulting class is extremely slow). G++ tried, and
failed—there is on particular use case where it breaks.
(Getting it to handle the other issues I've ignored is not
particularly difficult, but does represent a lot of lines of
code.)