4

So I got curious reading some C code; let's say we have the following code:

char text[10] = "";

Where does the C compiler then put the null character?

I can think of 3 possible cases

  1. In the beginning, and then 9 characters of whatever used to be in memory
  2. In the end, so 9 characters of garbage, and then a trailing '\0'
  3. It fills it completely with 10 '\0'

The question is, depending on either case, whether it's necessary to add the trailing '\0' when doing a strncpy. If it's case 2 and 3, then it's not strictly necessary, but a good idea; and if it's case 1, then it's absolutely necessary.

Which is it?

Electric Coffee
  • 11,733
  • 9
  • 70
  • 131
  • I'll take door #3, Monty. And that has nothing to do with what `strncpy()` does. What it should do is what it is specified to do. – Sam Varshavchik Nov 12 '16 at 16:53
  • You ask about the C compiler, but tagged this both C and C++. There are a few differences between C and C++ that are relevant here, so the C++ tag is inappropriate if that's not what you're asking. –  Nov 12 '16 at 16:54
  • @hvd thought I'd drop C++ in there too, since it would be interesting how both cases are handled. But you're right; the text itself doesn't mention C++ – Electric Coffee Nov 12 '16 at 16:57
  • Note that even if it was case 1, `strncpy` would work properly and you wouldn't need to add a trailing `'\0'`, because `strncpy` would see the `'\0'` byte and wouldn't look past it. If it was case 2, `strncy` would actually process the junk bytes in the beginning, which is pretty bad. As others have said, it's case 3 that's true, but I wanted to clarify your understanding of case 1 & 2. – Alok Singhal Nov 12 '16 at 17:07
  • @AlokSinghal the point here is that `strncpy` doesn't add a trailing `\0` like `strcpy` does. – Electric Coffee Nov 12 '16 at 17:14
  • Are you using `text` as the source or destination in `strncpy`? If destination, then it doesn't matter what it contains. If you are using `text` as source, then the `len` parameter would be >= 0, and `strncpy` will copy the `'\0'` from the source. – Alok Singhal Nov 12 '16 at 17:18

3 Answers3

5

In your initialization, the text array is filled with null bytes (i.e. option #3).

char text[10] = "";

is equivalent to:

char text[10] = { '\0' };

In that the first element of text is explicitly initialized to zero and rest of them are implicitly zero initialized as required by C11, Initialization 6.7.9, 21:

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

P.P
  • 117,907
  • 20
  • 175
  • 238
1

Quoting N1256 (roughly C99), since there are no relevant changes to the language before or after:

6.7.8 Initialization

14 An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

"" is a string literal consisting of one character (its terminating null character), and this paragraph states that that one character is used to initialise the elements of the array, which means the first character is initialised to zero. There's nothing in here that says what happens to the rest of the array, but there is:

21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

This paragraph states that the remaining characters are initialised the same as if they had static storage duration, which means the rest of the array gets initialised to zero as well.

Worth mentioning here as well is the "if there is room" in p14:

In C, char a[5] = "hello"; is perfectly valid too, and for this case too you might want to ask where the compiler puts the null character. The answer here is: it doesn't.

Community
  • 1
  • 1
0

String literal "" has type of character array char[1] in C and const char [1] in C++.

You can imagine it the following way

In C

chat no_name[] = { '\0' };

or in C++

const chat no_name[] = { '\0' };

When a string literal is used to initialize a character array then all its characters are used as initializers. So for this declaration

char text[10] = "";

you in fact has

char text[10] = { '\0' };

All other characters of the array that do not have corresponding initializers (except the first character that is text[0]) then they are initialized by 0.

From the C Standard (6.7.9 Initialization)

14 An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

and

21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration

and at last

10 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized explicitly, then:

— if it has pointer type, it is initialized to a null pointer;

— if it has arithmetic type, it is initialized to (positive or unsigned) zero;

— if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

— if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

The similar is written in the C++ Standard.

Take into account that in C you may write for example the following way

char text[5] = "Hello";
         ^^^

In this case the character array will not have the terminating zero because there is no room for it. :) It is the same as if you defined

char text[5] = { 'H', 'e', 'l', 'l', 'o' };
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335