1

I'm very new to C and am a bit confused as to when we need to manually add the terminating '\0' character to strings. Given this function to calculate string length (for clarity's sake):

int stringLength(char string[])
{
    int i = 0;
    while (string[i] != '\0') {
        i++;
}
    return i;
}

which calculates the string's length based on the null terminating character. So, using the following cases, what is the role of the '\0' character, if any?

Case 1:

char * stack1 = "stack"; 
printf("WORD %s\n", stack1);
printf("Length %d\n", stringLength(stack1));

Prints:

WORD stack
Length 5

Case 2:

char stack2[5] = "stack";
printf("WORD %s\n", stack2);
printf("Length %d\n", stringLength(stack2));

Prints:

WORD stack���
Length 8

(These results vary each time, but are never correct).

Case 3:

char stack3[6] = "stack";
printf("WORD %s\n", stack3);
printf("Length %d\n", stringLength(stack3));

Prints:

WORD stack
Length 5

Case 4:

char stack4[6] = "stack";
stack4[5] = '\0';
printf("WORD %s\n", stack4);
printf("Length %d\n", stringLength(stack4));

Prints:

WORD stack
Length 5

Case 5:

char * stack5 = malloc(sizeof(char) * 5);
if (stack5 != NULL) {
    stack5[0] = 's';
    stack5[1] = 't';
    stack5[2] = 'a';
    stack5[3] = 'c';
    stack5[4] = 'k';
    printf("WORD %s\n", stack5);
    printf("Length %d\n", stringLength(stack5));
}
free(stack5);

Prints:

WORD stack
Length 5

Case 6:

char * stack6 = malloc(sizeof(char) * 6);
if (stack6 != NULL) {
    stack6[0] = 's';
    stack6[1] = 't';
    stack6[2] = 'a';
    stack6[3] = 'c';
    stack6[4] = 'k';
    stack6[5] = '\0';
    printf("WORD %s\n", stack6);
    printf("Length %d\n", stringLength(stack6));
}
free(stack6);

Prints:

WORD stack
Length 5

Namely, I would like to know the difference between cases 1, 2, 3, and 4 (also why the erratic behavior of case 2 and no need to specify the null-terminating character in 1 and 3. Also, how 3 and 4 both work the same?) and how 5 and 6 print out the same thing even though not enough memory is allocated in case 5 for the null-terminating character (since only 5 char slots are allocated for each letter in "slack", how does it detect a '\0' character, i.e. the 6th character?)

I'm so sorry for this absurdly long question, it's just I couldn't find a good didactic explanation on these specific instances anywhere else

Coach
  • 309
  • 1
  • 4
  • 14
  • 1
    Very broadly, if you have a character string stored in an array, then you must have some way of knowing where the string ends. The two most obvious ways are (1) to keep a separate character count, or (2) to terminate the string with some unique character (e.g. `'\0'`). Option 2 seems to be the most common method today, and C automatically terminates string constants with `'\0'`. The standard C libraries also expcet `'\0'`-terminated strings. – Tom Karzes Sep 17 '17 at 06:04
  • @TomKarzes: "*C automatically terminates string constants with '\0'*" well, just string-*literals*. – alk Sep 17 '17 at 13:55
  • Do not try to learn C by Trial&Error, as this is know to cause depressions. – alk Sep 17 '17 at 13:57

2 Answers2

7

The storage for a string must always leave room for the terminating null character. In some of your examples you don't do this, explicitly giving a length of 5. In those cases you will get undefined behavior.

String literals always get the null terminator automatically. Even though strlen returns a length of 5, it is really taking 6 bytes.

Your case 5 only works because undefined sometimes means looking like it worked. You probably have a value of zero following the string in memory - but you can't rely on that.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
5

In case 1, you are creating a string literal (a constant which will be on read only memory) which will have the \0 implicitly added to it.

Since \0's position is relied upon to find the end of string, your stringLength() function prints 5.

In case 2, you are trying to initialise a character array of size 5 with a string of 5 characters leaving no space for the \0 delimiter. The memory adjacent to the string can be anything and might have a \0 somewhere. This \0 is considered the end of string here which explains those weird characters that you get. It seems that for the output you gave, this \0 was found only after 3 more characters which were also taken into account while calculating the string length. Since the contents of the memory change over time, the output may not always be the same.

In case 3, you are initialising a character array of size 6 with a string of size 5 leaving enough space to store the \0 which will be implicitly stored. Hence, it will work properly.

Case 4 is similar to case 3. No modification is done by

char stack4[5] = '\0';

because size of stack4 is 6 and hence its last index is 5. You are overwriting a variable with its old value itself. stack4[5] had \0 in it even before you overwrote it.

In case 5, you have completely filled the character array with characters without leaving space for \0. Yet when you print the string, it prints right. I think it is because the memory adjacent to the memory allocated by malloc() merely happened to be zero which is the value of \0. But this is undefined behavior and should not be relied upon. What really happens depends on the implementation.
It should be noted that malloc() will not initialise the memory that it allocates unlike calloc().

Both

char str[2]='\0';

and

char str[2]=0;

are just the same.

But you cannot rely upon it being zero. Memory allocated dynamically could be having zero as the default value owing to the working of the operating system and for security reasons. See here and here for more about this.

If you need the default value of dynamically allocated memory to be zero, you can use calloc().

Case 6 has the \0 in the end and characters in the other positions. The proper string should be displayed when you print it.

J...S
  • 5,079
  • 1
  • 20
  • 35
  • 4
    *"In case 5...I think it is becuase the memory adjacent to the memory allocated by malloc() was zero"* -- arguably could/could not be true. It is because you have invoked *Undefined Behavior* -- anything can happen, if the next byte in memory just happens to be `0`, then it works, but there is no guarantee what will happen -- *and* `malloc` does not initialize memory. But all considered, certainly worth a vote, will be better if you fix that statement. – David C. Rankin Sep 17 '17 at 08:24
  • 1
    @DavidC.Rankin Thanks for correcting me. I edited it. – J...S Sep 17 '17 at 09:47
  • 1
    This helped immensely in my understanding, thank you so much @J...S! What would happen in case 6 if I didn't explicitly add the line `stack6[5] = '\0';` since it's not a string literal would it add the '\0' character? I assume it wouldn't because I'm explicitly adding characters. So if I use `malloc` and not `calloc`, would this result in undefined behavior for the terminating character (i.e. rely on the cruft from the previous memory usage)? – Coach Sep 17 '17 at 19:41