23

Consider the following from C++11:

[C++11: 21.4.5]: basic_string element access                           [string.access]

const_reference operator[](size_type pos) const;
reference       operator[](size_type pos);

1     Requires: pos <= size().

2     Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.

3     Throws: Nothing.

4     Complexity: constant time.

This means either:

  • The referenced value in the pos == size() case shall not be modified, or
  • In any case, the referenced value returned by op[] shall not be modified, even for the non-const overload.

The second scenario seems completely ridiculous, but I think it's what the wording most strongly implies.

Can we modify what we get from std::string::op[], or not? And is this not rather ambiguous wording?

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • Looks like you can...but shouldn't. XD Wait, isn't the answer in the definition of SHOULD? – Klaim Oct 14 '11 at 10:07
  • @Klaim: It's about what it is that "the referenced value shall not be modified" is talking about. – Lightness Races in Orbit Oct 14 '11 at 10:08
  • I wonder how this compares to `vector::op[]`... interesting, I can't find anything where they talk about this... – Nim Oct 14 '11 at 10:11
  • @Nim: Interestingly, Table 101 calls `basic_string` a "container". News to me. – Lightness Races in Orbit Oct 14 '11 at 10:13
  • @Nim: Sequence containers do not appear to have this ambiguity, and I'd assume that this is because it does not have that dereferenceable sentinel value, and thus that the first scenario is correct. But it's by no means certain. – Lightness Races in Orbit Oct 14 '11 at 10:14
  • 1
    A very good question, I was under the impression that this was now allowed in c++11 but it's clear that there are some grounds to doubt it... – jcoder Oct 14 '11 at 10:16
  • Reading and re-reading, I would agree with @rodrigo and his comment below, you must not modify the reference in the case where `pos >= size`, otherwise, there is no specific restriction (so I guess implementation defined). – Nim Oct 14 '11 at 10:19
  • @Nim: If I get a non-`const` ref back and there's no restriction mentioned, then it had better not be implementation-defined as to whether I may modify the object through that ref. – Lightness Races in Orbit Oct 14 '11 at 10:21
  • I'd say the semicolon means that the sentence "the referenced value shall not be modified" applies in general, not just when `pos == size()`. – Kerrek SB Oct 14 '11 at 10:29
  • @Kerrek: Right, in terms of English that is implied, but in terms of common sense (and deductions from elsewhere in the document) it cannot be the case, surely. – Lightness Races in Orbit Oct 14 '11 at 10:31
  • @TomalakGeret'kal: It's really a shame that they chose to make it so ambiguous. Surely another half-sentence to explain what they mean wouldn't have hurt! Anyway, I'm tempted to stick with the immutable assumption simply because `data()` returns a const pointer. `vector` is no different from `string` in terms of allocation, yet `vector::data()` returns a mutable pointer, so that's making me cautious. – Kerrek SB Oct 14 '11 at 10:33
  • @KerrekSB: I think `data` is a red-herring and has nothing to do with it. – Lightness Races in Orbit Oct 14 '11 at 10:34
  • 1
    Actually, I've had a few other folks look purely at the language, and from an English language (not my first either), perspective, the `;` forces the latter condition to apply to the whole sentence, basically as it stands, *the reference value shall **not** be modified*. – Nim Oct 14 '11 at 10:37
  • I think that if on both cases the reference should not be modified, both should have returned a `const_reference`. – Eran Oct 14 '11 at 10:37
  • @Nim: I'm a bit of an English language expert (and it _is_ my first language) and, yes, that's the implication when considering the grammar alone. But that's the point of this question. – Lightness Races in Orbit Oct 14 '11 at 10:39
  • 1
    FWIW, assuming we take the sensible meaning I would write it, "`*(begin() + pos)` if `pos < size()`. Otherwise a reference to an object of type `charT` with value `charT()`; that object shall not be modified." – Steve Jessop Oct 14 '11 at 11:15
  • I'm told the wording is being fixed as a result of this question. :D – Lightness Races in Orbit Nov 10 '11 at 00:21

2 Answers2

12

The quote means that you cannot modify the return of operator[]( size() ), even if the value is well defined. That is, you must not modify the NUL terminator in the string even through the non-const overload.

This is basically your first option: i.e. pos >= size(), but because of the requirement pos <= size() the only possible value for that condition is pos == size().

The actual English description of the clause can be ambiguous (at least to me), but Appendix C, and in particular C.2.11 deals with changes in semantics in the string library, and there is no mention to this change --that would break user code. In C++03 the "referenced value shall not be modified" bit is not present and there is no ambiguity. The lack of mention in C.2.11 is not normative, but can be used as a hint that when they wrote the standard there was no intention on changing this particular behavior.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • 1
    That's the first possibility I mentioned (and yeah I know that it can be condensed to `pos == size()`), but can you be _sure_ that that is what it means? – Lightness Races in Orbit Oct 14 '11 at 10:10
  • 1
    If `operator[]` did not allow you to modify the contents even in the non-const overload, that would be a (non-sensical) break in the semantics from C++03, and it would be listed in the *C Compatibility* appendix of the standard. I am not a native English speaker, but common sense dictates this interpretation. – David Rodríguez - dribeas Oct 14 '11 at 10:13
  • 1
    Yes, it effectively means: if pos==size() then it returns a reference to a charT() that shall not be modified. I think that the comma and the semicolon are switched, because in English, `;` has more precedence than ','. – rodrigo Oct 14 '11 at 10:13
  • @DavidRodríguez-dribeas: I know; I said that the second possibility would be ridiculous. You're just repeating my question really :) – Lightness Races in Orbit Oct 14 '11 at 10:15
  • @rodrigo: Except a comma would be incorrect where the semicolon is currently. – Lightness Races in Orbit Oct 14 '11 at 10:18
  • 2
    @TomalakGeret'kal I am providing my interpretation, together with the additional bit that the break in the semantics would show up in Appendix C, specifically C.2.11, together with the other two changes in the semantics. No mention being done in that section can be used as a hint that the semantics in C++03 are maintained, and C++03 does not have any limitation on changing the values through `operator[]`. As of the question *am I sure?*, yes, I am sure. – David Rodríguez - dribeas Oct 14 '11 at 10:19
  • @DavidRodríguez-dribeas: That's pretty good reasoning actually. Fancy injecting it into your answer? :) – Lightness Races in Orbit Oct 14 '11 at 10:20
  • 1
    "I am not a native English speaker" - that's OK, at times I think the authors of the standard aren't either. And no doubt some of the people providing text truly aren't, so I definitely agree that an interpretation within the context of the rest of the standard is more important than a strict letter-of-English-grammar reading. – Steve Jessop Oct 14 '11 at 11:11
  • BTW, for "are you sure?" I was looking for more than just a "yes"! Your reasoning with Appendix C is good though. – Lightness Races in Orbit Oct 14 '11 at 12:16
  • @TomalakGeret'kal: Right, I should have added that *I am sure, but have been proven to be wrong many times* :) – David Rodríguez - dribeas Oct 14 '11 at 12:36
4

In n3690 (C++14 draft), the wording has been changed to:

Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.

I believe that this resolves the English ambiguity, and makes clear the intent of the original, ambiguous C++11 passage.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055