26

I have an issue in which the size of the string is effected with the presence of a '\0' character. I searched all over in SO and could not get the answer still.

Here is the snippet.

int main()
{
  std::string a = "123123\0shai\0";
  std::cout << a.length();
}

http://ideone.com/W6Bhfl

The output in this case is

6

Where as the same program with a different string having numerals instead of characters

int main()
{
  std::string a = "123123\0123\0";
  std::cout << a.length();
}

http://ideone.com/mtfS50

gives an output of

8

What exactly is happening under the hood? How does presence of a '\0' character change the behavior?

edmz
  • 8,220
  • 2
  • 26
  • 45
samairtimer
  • 826
  • 2
  • 12
  • 28
  • 6
    Do not put null characters (\0) in strings unless you have a very good idea what you are doing and why! – pjc50 Nov 08 '16 at 14:58
  • 2
    And if you do really need non-null-terminated strings, you'll want to use `std::literals::string_literals::operator""s` (C++14) or `std::string(char*, size_t)` (remember to include the final null if you want one). – Toby Speight Nov 08 '16 at 18:01
  • 2
    Note that your second string replaces "shai" (four characters) with "123" (three characters), so there would be a difference even without the octal sequence mentioned in the accepted answer. – Kyle Strand Nov 08 '16 at 18:45
  • [You should have also searched in the C tag](http://stackoverflow.com/questions/14264458/strlen-the-length-of-the-string-is-sometimes-increased-by-1). – Daniel Fischer Nov 08 '16 at 22:21

2 Answers2

47

The sequence \012 when used in a string (or character) literal is an octal escape sequence. It's the octal number 12 which corresponds to the ASCII linefeed ('\n') character.

That means your second string is actually equal to "123123\n3\0" (plus the actual string literal terminator).

It would have been very clear if you tried to print the contents of the string.

Octal sequences are one to three digits long, and the compiler will use as many digits as possible.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • Bingo! This was eating up my mind, thanks for answer. – samairtimer Nov 08 '16 at 09:43
  • Why isn't the first string length 12 and the second one 9? – Kyle Strand Nov 08 '16 at 18:42
  • ...and why is the second string length *larger* than the first? – Kyle Strand Nov 08 '16 at 18:42
  • 1
    @KyleStrand Because the `std::string` ctor taking a `const char*` treats it as a NUL-terminated string. – Angew is no longer proud of SO Nov 08 '16 at 19:20
  • @Angew Ah, sorry, you're right--so why not 13 and 10? – Kyle Strand Nov 08 '16 at 19:27
  • @KyleStrand Because it is "fooled" by the first `\0` in the buffer. – Angew is no longer proud of SO Nov 08 '16 at 19:29
  • 2
    @KyleStrand It sounds like you need some exposure to C strings. They do not behave the same as strings in higher level languages like Java, C#, Python, etc. The "null character" (`\0`) has a special meaning in C strings; namely, it marks the end of the string. – jpmc26 Nov 08 '16 at 19:47
  • @Angew Okay, you're right--I didn't think about the fact that `std::string(const char*)` must rely on null-termination of the argument to initialize the underlying data, since there's no other way for it to know the length of the string. – Kyle Strand Nov 08 '16 at 21:23
  • 1
    @jpmc26 I'm not sure what about my comments indicates that I don't understand how C strings work. My confusion was about the behavior of the `std::string` constructor invoked. Note that even in C++98, `std::string` *could* have had a constructor taking a `const char[]` *reference*, which would have allowed it to accept embedded `null` characters from string-literals: `std::string(const char (&ch)[N])`. This could have behaved identically to the `std::string(const char*, size_type count)` constructor. – Kyle Strand Nov 08 '16 at 21:40
  • .... I simply didn't realize that in fact this is not a constructor provided by `std::string`. – Kyle Strand Nov 08 '16 at 21:40
12

If you check the coloring at ideone you will see that \012 has a different color. That is because this is a single character written in octal.

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
  • 2
    What *stops* the octal sequence inside a string? –  Nov 08 '16 at 07:38
  • 5
    @Raw - Either the length of the sequence or a character that cannot be an octal digit. Like the s in `\0shai`. – Bo Persson Nov 08 '16 at 07:39
  • 1
    is it possible to create literal with digits (0-7) following \0? only something like `"/0" + "12"` (or perhaps even that won't work?) – slawekwin Nov 08 '16 at 07:41
  • 11
    @slawekwin - You might have to separate the strings if you do *not* want to form a max length octal sequence, like `"12345" "\01" "678"`to have a character with ASCII 1 in the middle. The compiler will combine adjacent string literals. You can also get a std::string *including* the nul characters, if you use the string constructor with a length parameter, like `std::string a("123123\0shai\0", 12);`. That will include all 12 characters in the string. – Bo Persson Nov 08 '16 at 07:47
  • @BoPersson : Clears a lot of air, thanks a lot for the answer. – samairtimer Nov 08 '16 at 09:44
  • 1
    @slawekwin An octal character can have at most 3 digits, so `"\0007"` is two characters: `'\0'` and `'7'`. – Angew is no longer proud of SO Nov 08 '16 at 19:22