2

So, I know that the last element of a string, aka char array is NULL which has the value of 0. If we define a string containing a word of 5 letters, say Stack, we would do it by doing as the following.

char word[5] = "Stack";

And if I wanted to access the first letter of the array, S, I would look for the index 0 by this: word[0], similarly for the last letter, k, I would do the index 4 by using word[4]. But something here doesn't really sit into my mind: we used the number 5 when declaring our array initially.

So my first question is, does the 5 in the declaration mean the program will use the indexes 0 to 5, saying that the index 5 will include the null character.

Now let's say that I want to define an int array which contains whatever, but for the sake of the question it shall include the odd numbers. I do that by typing this:

int odds[5] = {1, 3, 5, 7, 9};

The same goes here, does the 5 mean we will use indexes 0 through 5 and index 5 will take the NULL value? Also, as my second question, does int arrays also end by the NULL character (yes, it's a silly question since it's an int array but at least, does the fifth index include the value 0) As my third question, to make it more general, is there a general rule which says all arrays will end with the value 0?

For the first question, I looked at the net and some references and couldn't really find any answer that clicked.

For the second one, I tried to print the value of odds[5] and it returned 0. Then when I tried to print odds[6], it returned 0 as well, so I doubted the first answer I got for it could be a random value taken from the next adress from the odds[4] index rather than being given that value at the initalization.

Thanks in advance.

tindomieru
  • 31
  • 7
  • 6
    There is no convention for ending arrays in C. Most functions just take the length as a extra argument. Ending in null doesn't often make much sense since the array can contain 0 in the middle. Sometimes a special sentinel value is used that couldn't possibly appear but what that value can be depends on what valid values would be. – mousetail Apr 02 '23 at 15:38
  • 4
    Also note your first example is invalid, you need `char word[6] = "Stack"` since you need to keep space for the null terminator – mousetail Apr 02 '23 at 15:40
  • 3
    referencing past the end of an array is undefined behavior. The values could be zero, garbage, or whatever random bytes there were in that memory location. – OldProgrammer Apr 02 '23 at 15:42
  • A way that I often end arrays of pointers is with NULL but otherwise I just pass around the size with the array. – SafelyFast Apr 02 '23 at 15:44
  • thank you both. and you too, safelyfast. – tindomieru Apr 02 '23 at 15:45
  • 2
    A *string literal* such as `"Stack"` is a piece of C syntax that represents an array containing a *string*. On the other hand, a *string* is not an array at all, but rather a kind of value that an array of `char` can contain. – John Bollinger Apr 02 '23 at 16:05
  • 1
    `char word[5] = "Stack";` is not "invalid", it just doesn't do what the author may have expected. The initialization is valid and equivalent to `char word[5] = { 'S', ..., 'k' };`. The initialization `char word6[6] = "Stack"` is also valid and equivalent to `char word6[6] = {'S', ..., 'k', '\0'};` They are both valid, but they do different things. Passing `word` (unless it is changed after the initialization) to a function that expects a string, however, is invalid. – William Pursell Apr 02 '23 at 19:40
  • Also, with regard to your "first question", the [5] in the declaration means that `5*char` is reserved in memory, in this case 5 bytes, without taking into account any null character. An array initialized as `type[5] = ...` will have indices `0..4`. – Torge Rosendahl Apr 12 '23 at 03:00

4 Answers4

5

An array in C needn't end in NULL or '\0' or 0 or any other special value. Therefore char arrays (type char[]) needn't either.

However, an object of type char[] won't be considered a "string" in C if it doesn't end in the char '\0' (which is by the way completely equivalent to 0). In fact, string functions in C generally assume a string-final '\0'; if it is missing, your code will break.

Incidentally, you should write one of the following instead

char word[5+1] = "Stack";
char word[] = "Stack"; // array size computed by compiler

(as another answerer and a commenter already pointed out). The second form is recommended and less error-prone if you ever change your code.

For a T[N] array (where T is any type), the indices used are 0 to N-1.

If you initialize a char array with a string literal, an implicit '\0' is added automatically to the end because here

char str[] = "hat";
char str[] = { 'h', 'a', 't', '\0' };

the first line is equivalent to the second line by definition (the compiler automatically deduces the array size to be 4). Authoritative sources for this equivalence would be:

  • K&R (2e), p86
  • C17 standard draft, 6.7.9 Initialization

which give very similar examples.

A note of caution: For historical reasons, you can write things like char str[3] = "hat"; (instead of char str[4] = "hat"; or char str[] = "hat";), but you really shouldn't, because the result is technically not a "string" in C.

Referencing an out-of-bounds array element is undefined behavior. In your example, odds[5] and odds[6] evaluate to 0 just by accident.

Lover of Structure
  • 1,561
  • 3
  • 11
  • 27
4

If we define a string containing a word of 5 letters, say Stack, we would do it by doing as the following.

No, you should do:

char word[6] = "Stack";

or even better

char word[] = "Stack";

There is no "general rule" for ending an array, but the most common method is to post the length as an additional argument. Sentinel values has the obvious drawback that you have to choose a sentinel value from the set of possible values, and then that value cannot be used as a value.

klutt
  • 30,332
  • 17
  • 55
  • 95
  • thank you, i was so sure with my way since it worked. lol – tindomieru Apr 02 '23 at 15:46
  • 1
    The “even better” approach; is there documentation stating this is ‘preferred’, or is this personal / experience preference? – S3DEV Apr 02 '23 at 15:46
  • 3
    @rakkafunpun In C, there are very many things that "work" sometimes but are very unsafe. This is just one example. You need to very carefully read the documentation to know every expression really does – mousetail Apr 02 '23 at 15:50
  • @mousetail Do you have any recommended paper for me to read including every aspect of C? Currently I'm studying the book C Primer Plus, but after it end I am planning to read something advanced for C, for example the book by Dennis Ritchie and Brian Kernighan. – tindomieru Apr 02 '23 at 16:02
  • 1
    @S3DEV This is one of those cases where I would say that one alternative is objectively better. Do you seriously think the first option is better in any single way? – klutt Apr 02 '23 at 16:41
  • @rakkafunpun Including every aspect of the language? Read the standard. – Harith Apr 02 '23 at 17:06
  • @rakkafunpun See [The Definitive C Book Guide and List](https://stackoverflow.com/questions/562303/the-definitive-c-book-guide-and-list). My personal (shorter) answer: Aside from K&R (2e), people seem to like "C Programming: A Modern Approach (2e; King)" and "21st Century C (2e; Klemens)" for modern introductions written in a traditional style and "Head First C" for a (highly respected) introduction with many illustrations. "C in a Nutshell" (2e) is good, but it's a reference book. Check the [standards documents](https://stackoverflow.com/q/81656/2057969) to learn about details. – Lover of Structure Apr 02 '23 at 17:51
2

Consider this code:

#include<stdio.h>
#include<string.h>

int main() {
    char string1[4] = "helo";
    char string2[4] = "bye!";
    printf("%d %s", strlen(string1), string1);
}

Try it online

See how this outputs "helobye!"? That's because the strings are directly adjacent to eachother in memory with no null byte in between. Note that it could print even more, there "happened" to be a null byte at the end of the second string but there might as well not have been which would lead to printing more random garbage.

For this reason, always declare a string as 1 byte longer than it's text content to store the null byte:

#include<stdio.h>
#include<string.h>

int main() {
    char string1[5] = "helo";
    char string2[5] = "bye!";
    printf("%d %s", strlen(string1), string1);
}

Attempt This Online!

This outputs "helo" as you would expect. A string in C is just any array of bytes that ends with a null-terminator.

Do all arrays end with the value 0?

No, for some string functions to work they expect to be passed a pointer to the start of an array that ends with 0. However, others do not care. Most array functions just take the length as an extra argument. Some string functions can take a length too and if so don't need a null terminator.

What if I omit the null terminator and use a string function

The function will read and write random memory till by chance it encounters a null byte. This is considered a "buffer overflow" bug, and can lead to, among others:

  • Segfaults and crashes
  • Outputting important secrets stored in memory
  • Nasal Demons

Writing memory without properly bounding the length is even more dangerous.

Be careful!

mousetail
  • 7,009
  • 4
  • 25
  • 45
2

There are two important facts here:

  1. Arrays in C are 0-based. So if the array has size N, the valid indices go from 0 to N-1. There isn't an element in slot [N], and in general it's an error to attempt to access an element as if there were one there.
  2. Strings in C are arrays of char, with a special convention: they are terminated by a null character, which is most definitely part of the string, but is not counted in the string's length.

So for a string (but only for a string), if the string has length n, the array must have size (at least) n+1. The actual characters in the string will run from indices 0 to n-1, and element [n] will be the null terminator.

Other kinds of arrays (like, arrays of int) are not necessarily terminated with a special character. It's possible to do it that way, although it's somewhat unusual.

As to your questions:

So, I know that the last element of a string, aka char array is NULL which has the value of 0.

Mostly. The last element of a string is a null character, which is ASCII "NUL" or, in C, '\0'. In C, NULL is a null pointer, which is a different sort of beast.

If we define a string containing a word of 5 letters, say Stack, we would do it by doing as the following.

char word[5] = "Stack";

No, that would not define a proper string. To define a proper string you would either need to do

char word[6] = "Stack";

to leave room for the null character, or

char word[] = "Stack";

to let the compiler compute the right size for you.

But something here doesn't really sit into my mind: we used the number 5 when declaring our array initially.

Right — by which I mean, you're right to feel uneasy, because, yes, something's wrong.

So my first question is, does the 5 in the declaration mean the program will use the indexes 0 to 5, saying that the index 5 will include the null character.

No, 5 meant the array would have indices 0 to 4, and there would be no room for the null character.

Now let's say that I want to define an int array

Okay, but as I mentioned, there's no necessary implication that there'll be a terminator in that case.

does the 5 mean we will use indexes 0 through 5 and index 5 will take the NULL value?

Again, the 5 means you'd use indices 0 to 4 for the five values you initialized the array with. If you wanted to use an explicit terminator (perhaps 0 or -1), you'd need to make the array bigger, with a size of at least 6.

is there a general rule which says all arrays will end with the value 0?

No, there is definitely no such rule.

For the second one, I tried to print the value of odds[5] and it returned 0. Then when I tried to print odds[6], it returned 0 as well, so I doubted the first answer

C doesn't generally do array bounds checking. So accessing both [5] and [6] was wrong, and there's no telling what values you'd get — just about the only guarantee is that you're not guaranteed to get an error message or anything. If you access an array out of bounds, it'll often seem to work (you won't get an error message), and sometimes the value will even seem to be reasonable, but it's not guaranteed.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103