1

I'm trying to understand what std::string::size() returns.

According to https://en.cppreference.com/w/cpp/string/basic_string/size it's the "number of CharT elements in the string", but I'm not sure how that relates to the number of printed characters, especially if string termination characters are involved somehow.

This code

int main()
{
    std::string str0 = "foo" "\0" "bar";
    cout << str0 << endl;
    cout << str0.size() << endl;

    std::string str1 = "foo0bar";
    str1[3] = '\0';
    cout << str1 << endl;
    cout << str1.size() << endl;
    return 0;
}

prints

foo
3
foobar
7
  • In the case of str0, the size matches the number of printed characters. I assume the constructor iterates on the characters of the string literal until it reaches \0, which is why only 'f', 'o' and 'o' are put in the std::string, i.e. 3 characters, and the string termination character is not put in the std::string.
  • In the case of str1, the size doesn't match the number of printed characters. I assume the same went on as what I described above, but that I broke something by assigning a character. According to cppreference.com, "the behavior is undefined if this character is modified to any value other than CharT()", so I assume I've walked into undefined behavior here.

My question is this: outside of undefined behavior, is it possible that the size of a std::string doesn't match the number of printed characters, or is it actually something guaranteed by the standard?

(note: if the answer to that question changed between versions of the standard I'm interested in knowing that too)

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Eternal
  • 2,648
  • 2
  • 15
  • 21
  • `According to https://en.cppreference.com/w/cpp/string/basic_string/operator_at, the behavior is undefined if this character is modified to any value other than CharT(), so I assume I've walked into undefined behavior here.` Why? `'\0'` is a valid CharT in your case. `is it possible that the size of a std::string doesn't match the number of printed characters` size returns the number of `CharT`s in your string, it has nothing to do with whether they are printable or not. Strings _can_ contain binary data. – tkausl Nov 30 '20 at 12:25
  • Note: `std::string str0 = "foo" "\0" "bar";` is equivalent to `std::string str0 = "foo";` Probably you wanted to construct it like this `std::string str0 ("foo\0bar", 7);`? – Algirdas Preidžius Nov 30 '20 at 12:26
  • @AlgirdasPreidžius just `std::string str0("foo\0bar"s)` is enough. See [How do you construct a std::string with an embedded null?](https://stackoverflow.com/q/164168/995714) – phuclv Nov 30 '20 at 12:32
  • @phuclv That works too. I forgot about existence of `string_literals` namespace. – Algirdas Preidžius Nov 30 '20 at 12:43
  • @tkausl The ref says `CharT()`, i.e. specifically the default value for CharT, and I assumed it was something other than `\0` – Eternal Nov 30 '20 at 13:44

2 Answers2

3

In the case of str1 ... the behavior is undefined if this character is modified to any value other than CharT(), so I assume I've walked into undefined behavior here.

Your assumption is wrong. There is no UB for two reasons:

  • You did assign the element to '\0' which happens to be same as CharT() and thus it would be well defined to assign that value to str1[str1.size()].
  • Furthermore, str1.size() is 7 as you demonstrated and 3 is less than 7 and is therefore within bounds and it would be well defined to assign any value to that element.

is it possible that the size of a std::string doesn't match the number of printed characters

Yes, it is possible. std::string can contain non-printable characters as well, and thus the size is not necessarily the same as the number of printed characters. Your example str1 has no undefined behaviour and demonstrates how size can be different from number of printed characters.

Besides non-printable characters, in some character encodings - notably in unicode - grapheme clusters may consist of multiple graphemes which may consist of multiple code points which may consist of multiple code units (code unit is a single char object). The size of the string is the number of chars i.e. the number of code units. Thus, one should not expect the size of the string to match the number of printed characters.

or is it actually something guaranteed by the standard?

No such guarantee exists.

if the answer to that question changed between versions of the standard I'm interested in knowing that too

There has been no change regarding this.

eerorika
  • 232,697
  • 12
  • 197
  • 326
2

std::string has several constructors, one of which receives const char* and that's the one that constructs str0. Because there's no length information provided, the string will just be initialized until the null termination character is found

In case of str1 then the string length is really 7 characters. When you replace str1[3] with '\0' then the string doesn't change its length, but the content is now "foo\0bar". Unlike C string, std::string can contain embedded null because it has the length information. Therefore when you cout << str1 << endl; exactly 7 bytes are printed out. It's just that you don't see the byte '\0' in the output because it's ASCII NUL which isn't a printable character

It's recommended to use the s suffix to construct the std::string faster and with the ability to construct from a string with embedded null directly without resorting to another constructor. Try auto str0 = "foo\0bar"s; and see

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • Why " str1[3] = '\0\; " didn't work as replace but insert? Then, how to do char replacement? if one like to replace 'b' by '\0'. – ytlu Nov 30 '20 at 14:40
  • @ytlu who said that? It replaces `'0'` with `'\0'`. If you want to replace the `'b'` in the 5th position then use `str1[4] = '\0'` – phuclv Nov 30 '20 at 14:42