28

I have seen many times that std::string::operator[] does not do any bounds checking. Even What is the difference between string::at and string::operator[]?, asked in 2013, the answers say that operator[] does not do any bounds checking.

My issue with this is if I look at the standard (in this case draft N3797) in [string.access] we have

const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
  1. Requires: pos <= size().
  2. Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.
  3. Throws: Nothing.
  4. Complexity: constant time.

This leads me to believe that operator[] has to do some sort of bounds checking to determine if it needs to return a element of the string or a default charT. Is this assumption correct and operator[] is now required to do bounds checking?

Community
  • 1
  • 1
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • Not only does `std::basic_string::operator[]` not do bounds checking, it cannot and must not do bounds checking to be compliant. The standard specifically says this function **throws nothing**. – David Hammen Jul 21 '16 at 16:11
  • 8
    @DavidHammen Do you have to throw something to do bounds checking? Couldn't a conforming implementation check if `pos >= size()` and return `charT()` if it wants to? – NathanOliver Jul 21 '16 at 16:14
  • 3
    @DavidHammen: `exit()` doesn't throw and appears standards conforming. – Mooing Duck Jul 21 '16 at 19:36
  • 2
    @DavidHammen Requiring `pos <= size()` means if `pos > size()` the behavior is undefined. So it can still legally throw something. – johnchen902 Jul 22 '16 at 03:51

5 Answers5

47

The wording is slightly confusing, but if you study it in detail you'll find that it's actually very precise.

It says this:

  • The precondition is that the argument to [] is either = n or it's < n.
  • Assuming that precondition is satisfied:
    • If it's < n then you get the character you asked for.
    • "Otherwise" (i.e. if it's n) then you get charT() (i.e. the null character).

But no rule is defined for when you break the precondition, and the check for = n can be satisfied implicitly (but isn't explicitly mandated to be) by actually storing a charT() at position n.

So implementations don't need to perform any bounds checking… and the common ones won't.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 7
    *"But no rule is defined for when you break the precondition"* is probably the most important part of your answer ;) – Holt Jul 21 '16 at 14:14
  • @Holt: Almost; the OP is really asking how the "otherwise" clause can be triggered without bounds checking! – Lightness Races in Orbit Jul 21 '16 at 14:14
  • 1
    I get tripped up by another precondition. I need to remember breaking those is UB regardless what the rest of it says. Thanks. – NathanOliver Jul 21 '16 at 14:16
  • 3
    Perhaps, it is interesting to remember that c_str and data() perform the same function since c++11 and they don't invalidate iterators. This implies that a charT() is stored an position n. – EFenix Jul 21 '16 at 14:33
  • 2
    @AntonioGarrido I guess an arcane implementation still doesn't have to. In my understanding it could still check for the `n == pos` case and return a reference to a static `charT` instance, `data()` and `c_str` could allocate a copy of the intern state plus a default const. `charT` at the end. Not feasible, but IMHO allowed. – Superlokkus Jul 21 '16 at 15:31
  • @Superlokkus Right, Up for you. My answer is motivated by the Josuttis'book, where it is explained that [] does not check bounds... the only possibility is to add the charT() at the end. However, you are right, the standard does not imply it is the only solution. – EFenix Jul 21 '16 at 15:44
  • Well, since C++11 `operator[]` has to return maybe-const `charT&`. So I think by contiguousness we *must* have `&mystr[n] == (&*mystr.begin()) + n`. Maybe I'm being dim, but I don't see how claims about shadow copies of the string without a nul at the end can actually work. The language is that `mystr[mystr.size() - 1]` returns a `&charT` to the last character of the string, and that `*(&that_thing+1)` must be a nul. So the implementation must store a nul after the last character. It can play silly games, but as soon as you access any character it can't avoid storing a nul after the last. – Steve Jessop Jul 21 '16 at 18:11
  • @SteveJessop: The contiguity applies to the "contents" of the string, not to the apparent terminating null. That `c_str()` must operate in constant time is a much more interesting constraint on implementations. – Lightness Races in Orbit Jul 21 '16 at 20:19
  • @LightnessRacesinOrbit: ah, so my argument fails since `*(&mystr[mystr.size()-1] + 1)` is UB, because contiguity only runs to `size()-1`, not to `size()`. That's what I didn't realise. – Steve Jessop Jul 22 '16 at 00:48
  • @SteveJessop As you already aware, you were right and wrong. Yes the constant complexity constraint, makes such an arcane implementation further hard, but imagine a C++ implementation ridiculously optimized for saving space: It desperately tries to save the extra null byte at the end of every string... Actually scratch everything I just said, I just read that you're right, even in the part where you thought you were not: C++14 §21.4.7.1 states that it must be continuous from 0 up to size() not only size()-1 . – Superlokkus Jul 22 '16 at 09:40
  • @Superlokkus: Up to and including? Or up to? The C++ standard typically talks in terms of half-open ranges. – Lightness Races in Orbit Jul 22 '16 at 10:56
  • 2
    @LightnessRacesinOrbit I know usually, but §21.4.7.1 from N4140 states in Line 1: < "in [0,size()]" Which due to ] instead of ) is pretty including to me ;-) or to quote a bit more <1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()] – Superlokkus Jul 22 '16 at 11:14
  • @Superlokkus: Yep okay - then Steve's right and this implementation is indeed mandated. – Lightness Races in Orbit Jul 22 '16 at 11:19
14

operator[] has do some sort of bounds checking to determine...

No it doesn't. With the precondition

Requires: pos <= size().

it can just ASSUME that it can always return an element of the string. If this condition isn't met: Undefined behaviour.

The operator[] will likely just increment the pointer from the start of the string by pos. If the string is shorter, well then it just returns a reference to the data behind the string, whatever it might be. Like a classic out of bounds in simple C arrays.

To fullify the case of where pos == size() it could just have allocated an extra charT at the end of its internal string data. So just incrementing the pointer without any checks, would still deliver the stated behaviour.

Rakete1111
  • 47,013
  • 16
  • 123
  • 162
Superlokkus
  • 4,731
  • 1
  • 25
  • 57
4

First, there is a requires clause. If you violate the requires clause, your program behaves in an undefined manner. That is pos <= size().

So the language only defines what happens in that case.

The next paragraph states that for pos < size(), it returns a reference to an element in the string. And for pos == size(), it returns a reference to a default constructed charT with value charT().

While this may look like bounds checking, in practice what actually happens is that the std::basic_string allocates a buffer one larger than asked and populates the last entry with a charT(). Then [] simply does pointer arithemetic.

I have tried to come up with a way to avoid that implementation. While the standard does not mandate it, I could not convince myself an alternative exists. There was something annoying with .data() that made it difficult to avoid the single buffer.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
2

This operator of standard containers emulates the behavior of the operator [] of ordinary arrays. So it does not make any checks. However in the debug mode the corresponding library can provide this checking.

If you want to check the index then use member function at() instead.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
1

http://en.cppreference.com/w/cpp/string/basic_string/operator_at

Returns a reference to the character at specified location pos. No bounds checking is performed.

(Emphasis mine).

If you want bounds checking, use std::basic_string::at

The standard imply the implementation needs to provide bounds checking because it basically describes what an unchecked array access does.

If you access within bounds, it's defined. If you step outside, you trigger undefined behavior.

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • 8
    Quoting a non-canonical website against the standard is not evidence. – Yakk - Adam Nevraumont Jul 21 '16 at 14:10
  • 3
    @Yakk Quoting a highly regarded reference site that has no doubt done its due dilligence to make sure what is says complies with the standard isn't a proof, but it is a pretty strong hint that the reasoning I've provided in the 2nd half of the answer is correct. – Petr Skocik Jul 21 '16 at 14:23
  • The standard doesnt imply any need for bounds checking, instead it quite clearly says that the implementor only needs to code for n <= size(). Anything else is undefined behavior, no checking required. – kfsone Jul 22 '16 at 06:01