3

I'm struggling with this part in the C standard about string literals, especially the second part of it:

"In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. 80)"


"80) A string literal might not be a string (see 7.1.1), because a null character can be embedded in it by a \0 escape sequence."

Source: ISO/IEC 9899:2018 (C18), §6.4.5/6, Page 51

I don't understand the explanation - "because a null character can be embedded in it by a \0 escape sequence.".


To look at the referenced section §7.1.1., regarding the definition of a "string", it is stated:

"A string is a contiguous sequence of characters terminated by and including the first null character."

Source: ISO/IEC 9899:2018 (C18), §7.1.1/1, Page 132

I've thought about that the focus maybe lays on the "can", in a way that a string literal does not have to include/embed the null character, while a string is needed to.

But then again I´m asking myself: How is one able to use a string literal as string if it has not a string-terminating null character in it, to determine the end of the string (required for string-operating functions)?

I´m totally drawing blanks at the moment.


Note: I´m aware of that a string literal is stored in read-only memory and can´t be modified and a string is a generic term for a sequence of characters terminated by NUL, which can or can not be mutable.

Thus, my question is not: "What is the difference between a string and a string literal?"

My Question is:

  • Why/How can a string-literal not be a string?

and, according to my concerns, so far:

  • Is it true, that a string literal can have the NUL byte omitted?

I wanted to ask this question myself but short before posting it, I got the clue. My confusion was made because of the little misplaced wording inside of the quote.

But I decided to not delete the question´s draft as it could be useful for future readers and provide a Q&A instead.

Feel free to comment and hint.


Related stuff:

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 2
    I'd say it's a bug in the documentation -- having `char *p = "foo\0bar";` we can say `p` points to a string with length 3; `p+4` points to another string :) – pmg Jun 06 '20 at 14:52
  • @pmg Indeed. `"foo\0bar"` is a string literal that contains ***two*** strings, thus it is not ***a*** string. Yes, it really is that simple. – Andrew Henle Jun 06 '20 at 15:46

2 Answers2

3

You're overthinking it.

"A string is a contiguous sequence of characters terminated by and including the first null character."

Source: ISO/IEC 9899:2018 (C18), §7.1.1/1, Page 132

says that a "string" only extends up to the first null character. Characters that may exist after the null are not part of the string. However

"80) A string literal might not be a string (see 7.1.1), because a null character can be embedded in it by a \0 escape sequence."

makes it clear a string literal may contain an embedded null. If it does, the string literal AS A WHOLE is not a string -- the string is just the prefix of the string literal up to the first null

Community
  • 1
  • 1
Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
0

Let´s take a look at the definition of the term "string literal" at the same section in C18, §6.5.1/3:

"A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz"."

According to that, a string literal is only consisted of the characters enclosed in quotation marks, the bare string content. It does not have an appended \0. The NUL byte is appended later at translation, as said at §6.5.1/6:

"In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. 80)"


Let´s make an example:

"foo" is a string literal, but not a string because "foo" does not contain an embedded null character.

"foo\0" is a string literal and a string because the literal itself contains a null character at the end of the character sequence.


Note that you don´t need to explicitly insert the null character at the end of a string literal to change it to a string. As already said, it is implicitly appended during the program translation.

Means,

const char *s = "foo";

is equal to

const char *s = "foo\0";

I admit, that the sentence of:

"A string literal might not be a string (see 7.1.1), because a null character can be embedded in it by a \0 escape sequence."

is a little confusing and illogical in the context. It would be better phrased:

"A string literal might not be a string (see 7.1.1), because a null character might not (OR is not required to) be embedded in it by a \0 escape sequence."

or alternatively:

"A string literal might not be a string (see 7.1.1), because a null character can be embedded in it by a \0 escape sequence."


As @EricPostpischil pointed in his comment, the meaning of the footnote is probably quite different.

It means that if the string literal contains a null character inside of it, but not at the end, as it is required for a string, the string literal is not equivalent to a string.

F.e.: The string literal

"foo\0bar"

is not a string, as it contains the first null character embedded inside of the string literal, but not at the end of it.

  • 4
    You have missed, or at least not made clear, an aspect. The sequence of characters defined by `"abc\0def"` is not a string because it is not terminated by its first null character. 77.11 says “A *string* is a contiguous sequence of characters terminated by and including the first null character.” That is why footnote 80 says a string literal might not be a string: A string literal might not conform to the rule that a string ends with its first null character. – Eric Postpischil Jun 06 '20 at 14:52
  • @EricPostpischil Ahh, that makes sense. I focused on the existence of the null character itself but not its location. Yes, then actually, the footnote makes sense. I took your input into the answer. But nonetheless, I think this footnote is not quite clear and idiot-proof, like it was well shown by this question. :-) – RobertS supports Monica Cellio Jun 06 '20 at 15:21
  • @EricPostpischil But it gives another question to me: Is `"abc\0\0"` by that definition a string? – RobertS supports Monica Cellio Jun 06 '20 at 15:38
  • 1
    `"abc\0\0"` has a character, `\0`, after its first null character, so that character sequence does not conform to the rule for what a string is. – Eric Postpischil Jun 06 '20 at 15:39