49
std::string my_string = "";
char test = my_string[0];

I've noticed that this doesn't crash, and every time I've tested it, test is 0.

Can I depend on it always being 0? or is it arbitrary?

Is this bad programming?

Edit: From some comments, I gather that there is some misunderstanding about the usefulness of this.

The purpose of this is NOT to check to see if the string is empty. It is to not need to check whether the string is empty.

The situation is that there is a string that may or may not be empty. I only care about the first character of this string (if it is not empty).

It seems to me, it would be less efficient to check to see if the string is empty, and then, if it isn't empty, look at the first character.

if (! my_string.empty())
    test = my_string[0];
else
    test = 0;

Instead, I can just look at the first character without needing to check to see if the string is empty.

test = my_string[0];
beauxq
  • 1,258
  • 1
  • 13
  • 22
  • 24
    Use `std::string::empty`. – 101010 Oct 12 '15 at 14:08
  • From the online reference: `Accessing the value at data()+size() produces undefined behavior: There are no guarantees that a null character terminates the character sequence pointed by the value returned by this function. See string::c_str for a function that provides such guarantee.` – Logicrat Oct 12 '15 at 14:10
  • 4
    @Logicrat: You are either using an old reference that gives the rules for C++98, or else looked up the wrong function. – Ben Voigt Oct 12 '15 at 14:19
  • 1
    What is "the" online reference, @Logicrat? – Lightness Races in Orbit Oct 12 '15 at 14:23
  • Why would you want to use `string[0]` for an empty string? For mere testing, use `string::empty()`. – Walter Oct 12 '15 at 15:52
  • Note that the "trick" that you're using here might be from a programmer that used to use the method to test if a C String was empty. A C-style string is just a character array, and if the first character, `[0]` is zero, then the string is empty. This works because the C String is just an array and not a class that overrides the `[]` operator. – JPhi1618 Oct 12 '15 at 19:39
  • 14
    Aside: a string whose first character is zero is not necessarily an empty string! –  Oct 12 '15 at 22:25
  • @Hurkyl I think he is checking `(test == 0)` not `(test == '0')`. 0 in ASCII is `null` – Ahmed Nawar Oct 13 '15 at 22:13
  • 3
    @Ahmed: `std::string x = "123"; x[0] = 0; assert(x[0] == 0 && !x.empty());` –  Oct 13 '15 at 22:28
  • @Hurkyl I see your point, you are right – Ahmed Nawar Oct 14 '15 at 17:37

2 Answers2

70

C++14

No; you can depend on it.

In 21.4.5.2 (or [string.access]) we can find:

Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.

In other words, when pos == size() (which is true when both are 0), the operator will return a reference to a default-constructed character type which you are forbidden to modify.

It is not special-cased for the empty (or 0-sized) strings and works the same for every length.


C++03

And most certainly C++98 as well.

It depends.

Here's 21.3.4.1 from the official ISO/IEC 14882:

Returns: If pos < size(), returns data()[pos]. Otherwise, if pos == size(), the const version returns charT(). Otherwise, the behavior is undefined.

Bartek Banachewicz
  • 38,596
  • 7
  • 91
  • 135
  • 6
    Note that before C++11, the non-const version of `operator[]` would result in undefined behavior in this case (even if you don't modify the resulting reference). – interjay Oct 12 '15 at 14:15
  • This is an excellent answer. Can you provide information about your source? – Benilda Key Oct 12 '15 at 14:20
  • 1
    @BenKey: That's the numbering system used by the C++ Standard itself. – Ben Voigt Oct 12 '15 at 14:21
  • 2
    @BenKey For the first quote, I used a very useful [online render of (I think) last C++14 draft](http://eel.is/c++draft/string.access). For the second, it was the original ISO PDF. Both numbers represent sections is said document, as Ben V. pointed out. – Bartek Banachewicz Oct 12 '15 at 14:23
  • But if a reference is returned, it means that _somebody_, _somewhere_ must have constructed a char and initialized it to 0. (Unless the compiler optimizes that away.) What made the C++ committee make this the standard? – einpoklum Oct 12 '15 at 18:48
  • I disagree that "it depends" when using C++03. It's **always** bad to depend on undefined behaviour. It's possible to evade the UB by changing the code to use the const version of `operator[]`, but what's shown is using the non-const overload and that's unsafe. – Toby Speight Dec 05 '18 at 14:02
  • @TobySpeight If `pos < size()`, there's no undefined behavior. – Bartek Banachewicz Dec 05 '18 at 14:06
  • 1
    Yes, but here we're looking at `pos == size()` (both equal to `0`). – Toby Speight Dec 05 '18 at 14:07
  • Since C++11 it is well documented and defined behaviour, before that it was still common. If it is only for checking the first character then this is a legitimate, simple and concise way of doing that. – ABaumstumpf Feb 17 '21 at 12:14
32

@Bartek Banachewicz's answer explains which circumstances allow you to make your assumption. I would like to add that

This is bad programming.

Why? For several reasons:

  1. You have to be a language lawyer just to be sure this isn't a bug. I wouldn't know the answer if not for this page, and frankly - I don't think you should really bother to know either.
  2. People without the intuition of a string being a null-terminated sequence of characters will have no idea what you're trying to do until they read the standard or ask their friends.
  3. Breaks the principle of least astonishment in a bad way.
  4. Goes against the principle of "writing what you mean", i.e. having the code express problem-domain concepts.
  5. Sort-of-a use of a magic number (it's arguable whether 0 actually constitutes a magic number in this case).

Shall I continue? ... I'm almost certain you have an alternative superior in almost every respect. I'll even venture a guess that you've done something else that's "bad" to manipulate yourself into wanting to do this.

Always remember: Other people, who will not be consulting you, will sooner-or-later need to maintain this code. Think of them, not just of yourself, who can figure it out. Plus, in a decade from now, who's to say you're going to remember your own trick? You might be that confounded maintainer...

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • Counterpoint: one might hope that ten years from now, the validity of arbitrary indices into a `std::string` would be common knowledge. –  Oct 12 '15 at 22:24
  • 3
    -1: It's not bad programming. The behavior is defined, just like s[-2] in many languages returns the next-to-last character in the string. Yes, there will be some C++ programmers who don't know this behavior is defined, and a comment may be in order. But I wouldn't add a single line of code if s[0] was adequate. – kevin cline Oct 13 '15 at 06:21
  • 2
    @kevincline: The fact that something is defined does not mean it should be used. In fact, behavior which requires specific definition for a marginal case, and which could well have been different, is the kind of behavior it is often (not always) better to avoid. Also, terseness is nice, but: 1. There are other ways to achieve it. 2. You must still balance terseness with clarity and not just sacrifice the latter for the former. – einpoklum Oct 13 '15 at 06:51
  • implementing minus indexes would be code that all have to pay for but very few will use, so it will not make it in C++. – Surt Oct 13 '15 at 07:40
  • @kevincline: s[0] or s[-2] is just fine, that's not the issue. The problem is using s[0] or s[-2] or what-not on an empty string. It's confusing: If you know the string is empty, why are you trying to access its characters? Just don't. – einpoklum Oct 13 '15 at 08:53
  • @einpoklum: it's only confusing once. Then it becomes idiomatic. – kevin cline Oct 13 '15 at 18:01
  • 2
    Something does not become idiomatic because one person, or a few people, use it. Also, a good idiom is not initially confusing, I would say. – einpoklum Oct 13 '15 at 18:41
  • @einpoklum: While I agree with the pervasiveness aspect of your comment, I have to disagree with the confusing bit -- especially when the source of confusion is less about what the idiom actually is and more about the fact people have spent the last ten years internalizing the consequences of the fact this feature used to not exist. –  Oct 13 '15 at 23:54
  • @Hurkyl: I'm not the most experienced C++ programmer, to say the least, but I don't recall noticing people using `std::string` itself rather than `char *` functions and `.c_str()` a lot, ever trying to get characters from an empty string. Or - maybe I'm not quite following. – einpoklum Oct 14 '15 at 09:06
  • 3
    Defined behaviour or not: If I'd read code similar to the first two lines of the OP, I'd go WTF. And remember, even in C++14 you *must not* modify the returned reference or you are in for UB again. Definitely bad, bad style. – TobiMcNamobi Oct 14 '15 at 15:45
  • Just FTR: _"You have to be a language lawyer just to be sure this isn't a bug."_ Oh, you have to be a language lawyer already to just use C++!... (Try initializing a variable, for instance. :) ) IOW, you're going to be consulting the references all the time, just to survive, anyway. (And this is actually one of the common things you'll likely learn and remember, unlike countless other landmine language features. Speaking from 30 years of C++ experience.) – Sz. Apr 26 '23 at 13:04
  • @Sz.: I sympathize, but there are different degrees of lawyering. And initializing a variable with `MyType myvar { initializer };` can be your go-to without having to think about it. – einpoklum Apr 26 '23 at 15:52