4

I'm trying to understand how strings really work in C++ because I just got really confused after coming across an unexpected behavior.

Considering a string, I insert a character (not using append()) using [] operator:

string str;
str[0] = 'a';

Let's print the string:

cout << "str:" << str << endl;

I get NULL as output:

str:

Ok, let's try printing the only character in the string:

cout << "str[0]:" << str[0] << endl;

Output:

str[0]:a

Q1. What happened there? Why was a not printed in the first case?

Now, I do something that should throw a compilation error but it doesn't and my question is again, why.

str = 'ABC';

Q2. How's that not an incorrect semantic i.e. assigning a character (which is not really a character but essentially a string in single quotes) to a string?

Now, worse when I print the string, it always prints last character i.e C (I was expecting first character i.e. A):

cout << "str:" << str << endl;

Output:

str:C

Q3. Why was the last character printed, not first?

Duh
  • 87
  • 1
  • 1
  • 12
  • `string str;` makes an empty string. You need to tell it to be a size if you want to use `[]`. – NathanOliver Jan 04 '17 at 18:52
  • Why is this question being downvoted? I tried my best to answer my questions on my own but failed and that's why asking here. What is the use of this community if all people do is downvote questions even if the person who asked did his homework? :-/ – Duh Jan 04 '17 at 18:54
  • `str[0]` returns a reference to the `'\0'` terminator, that [*should not be modified*](http://www.cplusplus.com/reference/string/string/operator[]/). Everything after that modification is undefined behavior. – dhke Jan 04 '17 at 18:54
  • 5
    `I insert a character using [] operator` - no you dont and the rest is irrelvant, you are just tinkering with memory you dont own – Oleg Bogdanov Jan 04 '17 at 18:54
  • Read some documentation, you're violating the preconditions of [`string::operator[]`](http://en.cppreference.com/w/cpp/string/basic_string/operator_at). And your compiler should be warning you about multi-character literals here - `str = 'ABC';` – Praetorian Jan 04 '17 at 18:55
  • `essentially a string in single quotes` No, you should have got a warning , see http://stackoverflow.com/questions/7755202/multi-character-constant-warnings – Oleg Bogdanov Jan 04 '17 at 18:56
  • I don't want to come off as "not listening" but I didn't get any warnings. I'm not sure why. I'll see if they are being suppressed or turned off. I'm on a PC in my college. – Duh Jan 04 '17 at 19:02
  • 2
    @Duh: "*I don't want to come off as "not listening" but I didn't get any warnings. I'm not sure why.*" Because you're using C++. And C++ is not a *safe* language. It allows you to do by default plenty of things that could break if you don't know what you're doing. – Nicol Bolas Jan 04 '17 at 19:15
  • @Duh: "but I didn't get any warnings. I'm not sure why." It depends on the compiler, which you haven't told us about. The thing to understand is `'ABC'` is an int, not a char; if I compile this in gcc I get two warnings, one for the multi-character character constant, and one for the implicit conversion from int to char when I assign the int to the string with it's char assignment operator. – Oliver Seiler Jan 04 '17 at 19:38
  • @OlegBogdanov "you are just tinkering with memory you dont own" I understand now but according to the [page](http://www.geeksforgeeks.org/c-string-class-and-its-applications/) from where I was reading about C++ strings "C++ string class internally uses char array to store character". Meaning `str[0] = 'a'` should work with string just like it does with `char str[] = ""` (but it doesn't as we saw). Can you help me understand why `[]` operator has different behavior in dealing with array of characters than string? Should I ask this as a separate question if it qualifies? – Duh Jan 05 '17 at 09:30
  • @Duh see the edit to my answer. – eerorika Jan 05 '17 at 10:41
  • @user2079303 Thanks, you've been really helpful with your answer. Things are much clearer to me now. – Duh Jan 05 '17 at 13:53

4 Answers4

5

Considering a string, I insert a character (not using append()) using [] operator:

string str;
str[0] = 'a';

You did not insert a character. operator[](size_type pos) returns a reference to the - already existing - character at pos. If pos == size() then behaviour is undefined. Your string is empty, so size() == 0 and therefore str[0] has undefined behaviour.

Q1. What happened there? Why was a not printed in the first case?

The behaviour is undefined.


Now, I do something that should throw a compilation error but it doesn't and my question is again, why.

str = 'ABC';

Q2. How's that not an incorrect semantic i.e. assigning a character ... to a string?

Assigning a character to a string is not incorrect semantic. It sets the content of the string to that single character.

Q2. ... a character (which is not really a character but essentially a string in single quotes) ...

It is a multicharacter literal. The type of a multicharacter literal is int. If the compiler supports multicharacter literals, then the semantic is not incorrect.

There isn't an assignment operator for string that would accept an int. However, int is implicitly convertible to char, so the assignment operator that accepts a char is used after the conversion.

char cannot necessarily represent all the values that int can, so it is possible that the conversion overflows. If char is a signed type, then this overflow has undefined behaviour.


Q3. Why was the last character printed, not first?

The value of a multicharacter literal is implementation-defined. You'll need to consult the manual of your compiler to find out whether multicharacter literals are supported, and what value you should expect. Furthermore, you'll need to consider the fact that the char that the value is converted to probably cannot represent all values of int.


but I didn't get any warnings

Then consider getting a better compiler. This is what GCC warns:

warning: multi-character character constant [-Wmultichar]

 str = 'ABC';

warning: overflow in implicit constant conversion [-Woverflow]


str[0] = 'a' should work with string just like it does with char str[] = "" (but it doesn't as we saw). Can you help me understand why [] operator has different behavior in dealing with array of characters than string?

Because that's how the standard has defined the behaviour and requirements of std::string.

char str[] = "";

Creates an array of size 1, consisting of the null terminator. This element of the array is like any other, and you can freely modify it:

str[0] = 'a';

This is well defined and OK. But now str no longer contains a null-terminated string, so trying to use it as such has undefined behaviour:

out << "str:" << str << endl; // oops, str is not a null terminated string

So, std::string has been designed such that you cannot mess with the final null terminator - as long as you obey the requirements of std::string. Not allowing touching the null terminator also allows the implementation to never allocate a memory buffer for an empty string. Not allocating memory may be faster than allocating memory, so this is a good thing.

Community
  • 1
  • 1
eerorika
  • 232,697
  • 12
  • 197
  • 326
2

You should take a look at http://en.cppreference.com/w/cpp/string/basic_string/operator_at. Namely, the portion about "If pos == size(), the behavior is undefined."

The following line creates an empty string:

string str;

so size() will return 0.

Trevor
  • 366
  • 2
  • 11
2

Your statement str string; str[0]='a' is undefined behaviour, though the reason for this differs between "before C++11" and "from C++11 on". Note that str is a non-const string. Before C++11 already a (read) access like str[pos] with pos == size() and str being a non-const string yields undefined behaviour. From C++11 on, a read-access would be permitted (yielding a reference to the '\0'-character. A modification, however, again is undefined in its behaviour. So far to the Cpp reference regarding std::basic_string::operator_at.

But now let's explain the behaviour of a program similar to yours but with defined behaviour; (I'll use this then as analogy to describe the behaviour of your program):

string str = "bbbb";

const char* cstr = str.data();
printf("adress: %p; content:%s\n", cstr, cstr);
// yields "adress: 0x7fff5fbff5d9; content:bbbb"

str[0] = 'a';
const char* cstr2 = &str[0];
printf("adress: %p; content:%s\n", cstr2, cstr2);
// yields "adress: 0x7fff5fbff5d9; content:abbb"

cout << "str:" << str << endl;
// yields "str:abbb"

The program is almost self explanatory, but note that str.data()gives a pointer to the internal data buffer, and str.data() returns the same address as &str[0].

If we now change the same program to your setting with string str = "", then there does not even change to much in the behaviour (although this behaviour is undefined, not safe, not guaranteed, and may differ from compiler to compiler):

string str;  // is the same as string str = ""

const char* cstr = str.data();
printf("adress: %p; content:%s\n", cstr, cstr);
// yields "adress: 0x7fff5fbff5c1; content:"

str[0] = 'a';
const char* cstr2 = &str[0];
printf("adress: %p; content:%s\n", cstr2, cstr2);
// yields "adress: 0x7fff5fbff5c1; content:a"

cout << "str:" << str << endl;
// yields "str:"

Note that str.data() returns the same address as &str[0] and that 'a' has actually been written to that address (if we have good luck, we do not access non-allocated memory, as an empty string is not guaranteed to have a buffer ready; maybe we have really good luck). So printing out str.data() actually gives you a (if we have additional luck that the character after 'a' is a string terminating char). Anyway, statement str[0]='a' does not increase string size, which is still 0, such that cout << str gives an empty string.

Hope this helps somehow.

Stephan Lechner
  • 34,891
  • 4
  • 35
  • 58
1
string str;

Makes a string of length 0.

str[0] = 'a';

Sets the first element of the string to 'a'. Note that the length of the string is still 0. Also note there may not be space allocated to hold this 'a' and the program is broken at this point so further analysis is best guesses.

cout << "str:" << str << endl;

Prints the contents of the string. The string is length 0, so nothing prints.

cout << "str[0]:" << str[0] << endl;

reaches into undefined territories and tries to read back the previously stored 'a'. This won't work, and the result is undefined. In this case it gave the appearance of working, possibly the nastiest thing undefined behaviour can do.

str = 'ABC';

is not necessarily an error as there are multibyte characters out there, but this most likely will, but is not required to, result in a warning from the compiler as it's probably a mistake.

cout << "str:" << str << endl;

Your guess is as good as mine what the compiler will do since str = 'ABC'; was logically incorrect (although syntactically valid). The compiler seems to have truncated ABC to the last character much like putting 257 into a 8 bit integer may result in preserving only the least significant bit.

user4581301
  • 33,082
  • 7
  • 33
  • 54