0

I'm taking CS50X. I'm on week 2 now. my question is: why do we need a null character '\0' in strings (aka null terminated char arrays) to mark its end, while we don't need it in normal char arrays or a non-string data type array such as an int array , like while printing both arrays (the null terminated char array and the int array for example) what does mark the end of the second array?

I tried to demonstrate how strings are implemented for myself with some code:

this code worked printing "hi!" in the terminal

this also worked printing the three scores

Why in the first code did we need an additional place in the array for the null character? Couldn't we have used i < 3 instead as we did in the second code? A character array, like any other array, has a specific length, so what changed when we decided to treat string as a character array?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Eyad Ayman
  • 11
  • 4
  • 6
    Could you paste the two code examples into your question as text? Screenshots of code are discouraged since we can't copy and paste them. – John Kugelman Aug 10 '23 at 01:46
  • 1
    "Couldn't we have used i < 3 instead as we did in the second code?" - well, sure, if you only wanted to be able to deal with arrays of exactly 3 characters. – user2357112 Aug 10 '23 at 01:46
  • A `char[2]` is a different type from a `char[3]`, which is a different type from a `char[4]`, etc. If you want to have _just one_ string type that works for any length -- potentially without knowing the length ahead of time so arbitrary content can be handled -- the "just define it in your code so it's known at compile time" approach doesn't work at all; you need to either store the length in memory at runtime (which is the approach some other language use -- traditionally, we call those "Pascal strings"), or signify the end when a terminator is seen (which is how C strings do it). – Charles Duffy Aug 10 '23 at 01:50
  • 1
    I'm sure there's a good duplicate, but I didn't find one – klutt Aug 10 '23 at 01:53
  • See: https://en.wikipedia.org/wiki/Sentinel_value https://en.wikipedia.org/wiki/Null_character and https://en.wikipedia.org/wiki/Null-terminated_string The advantage is simplicity and flexibility (vs. "pascal strings"). In the 3rd link, note the 2nd paragraph of the history section. – Craig Estey Aug 10 '23 at 02:25
  • The whole story is that back in ancient times when C was invented, one of the major changes from its predecessors was that arrays did not store their size together with the array data - you can read all the details here: https://www.bell-labs.com/usr/dmr/www/chist.html. At the same time as Unix moved over to C, it also moved away from "fixed length strings" to null terminated strings. For a while it supported both types of strings, until null terminated became the standard. A historical remain from this is the `strncpy` function, which is only to be used for these old fixed length strings. – Lundin Aug 10 '23 at 07:47

3 Answers3

4

The truth is that you don't need null terminators. They're just the convention that the C library chose to represent the end of the string.

For some purposes, it's a terrible choice. An example: when strings might contain nulls. Another: when string length must be computed often; the only way is to traverse the whole (potentially very long) string.

A method without these problems would be to represent a string as a char array (not null terminated) and an explicit length paired with it:

typedef struct string_s {
  char *text;
  size_t len;
} STRING;

And in fact you'll find systems written in C that do this.

The down side is that they can't use standard libraries for concatenation, i/o, etc. They need to supply their own. Also, size_t is up to 8 bytes while a terminating null is only one. When C was invented, that difference was a fairly big deal. In some applications (like very small embedded processors), it still is.

Gene
  • 46,253
  • 4
  • 58
  • 96
  • `char *text; size_t len;`, a pointer and size, is not quite comparable to C's _string_, an array-like entity (not a pointer) as the `struct` adds another level of indirection. It is that memory management, that even today, adds significant complexity. Pseudo code `size_t len; char text[len];` is a more comparable idea. – chux - Reinstate Monica Aug 10 '23 at 15:46
3

Why do we need a null terminator

To indicate the length.


When using functions on 1) a string or 2) an array, the function cannot receive the string or the array. It can receive a pointer to the string or the array. It will be a pointer to the first character of the string or array.

Now how does the function know now long the string or array is?


With strings, the function knows the length by inspecting the data and when it detects a null character, it knows that is the end of the string. No additional parameter was needed to be sent to the function.

foo_string(pointer_to_string_beginning);

With arrays, the caller needs to send the element count of the array to the function in addition to the pointer (in either prescribed order). The function can not use the data of the array to know the end as no value is reserved to indicate the "end".

foo_array(element_count_of_the_array, pointer_to_array_beginning);

If sending 2 parameters is OK, use arrays and size. Else for text, use 1 parameter for a string.

For text, strings are the common approach used since the 1970s.


How to return values to indicate a a string or array in the next concern, not yet addressed here.

jarmod
  • 71,565
  • 16
  • 115
  • 122
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
0

Short answer: To be able to store short strings in a bigger array.

Explanation:

Assume you have (one way or another) allocated a memory area capable of holding M characters and you want to store a string into that memory.

If the string has exactly M characters you can print it like:

for (i = 0; i < M; ++i) putchar(str[i]);

In principle it's not problem... You know the value M from the size of the memory area (note: this is only true in some cases but for now let's assume that).

But what if you want to store and later print a string with N (N < M) characters in that memory?

When printing it, you could of cause do:

for (i = 0; i < N; ++i) putchar(str[i]);

But from where do you get the value N?

Sometimes N is 5 (e.g. the string "Hello"), sometimes N is 13 (e.g. the string "stackoverflow"), and so on.

One solution would be to keep N in a seperate variable that you update whenever you change the string.

Another solution would be to use a sentinel value to indicate "End of string" and store that special value as part of the string.

There are pros and cons of both solutions.

The designers of C decided to go with the second solutions. So consequently we must always make sure to include the sentinel (the NUL) when dealing with strings in C.

The print can now be written:

for (i = 0; str[i] != '\0'; ++i) putchar(str[i]);

and it will work no matter what length the string has.

BTW:

Interresting read: https://stackoverflow.com/a/1258577/4386427

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63