-2

I have been learning C, and coming from more high level languages it definitely is something new! I know that arrays are terminated by a '\0' value, and I was wondering what are the instances in which the compiler does not provide this? For example:

int main() {
    int val;
    scanf("%d", &val);
    char arr[val + 1]; 
    arr[val] = '\0'; // The null value must be stated or something unpredictable occurs  
    for (int i = 0; i < val; i++ ) {
        arr[i] = 'a';
    }
    printf("This is the output:\n%s", arr);

    char arr2[5]; // This works perfectly fine
    for (int i = 0; i < 5; i++ ) {
        arr2[i] = 'a';
    }
    printf("This is the output:\n%s", arr2);
}

So the concept I have been getting is that when we either

  1. Dynamically allocate memory or
  2. Don't know the size of an array at compile time

is when the compiler can't/doesn't supply the ending null value. If this is the case, are there other cases to be aware of when working with arrays?

EDIT: After being pointed out, arr2 in this sense is not provided an ending zero terminator although it prints what I would expect.

joshpetit
  • 677
  • 6
  • 17
  • 3
    I do not understand. It's clearly programmers fault, that `arr2` is not zero terminated. It's not compilers fault, it's programmers fault. – KamilCuk Dec 08 '20 at 23:01
  • @KamilCuk really? I've run the program around a dozen times and `arr` provides a weird output while `arr2` provides the same output every time. I may be mistaken about when arrays are/aren't zero terminated. – joshpetit Dec 08 '20 at 23:04
  • Does this answer your quesiton? https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior Doing `printf("%s", arr2)` is undefined behavior. – KamilCuk Dec 08 '20 at 23:05
  • @KamilCuk Oh ok, so even though it prints exactly what I anticipated (or makes at least makes sense to me) it's something that's undefined by the standard. This makes sense, So would it be a best practice to always try and zero terminate arrays? – joshpetit Dec 08 '20 at 23:10
  • The compiler will automatically add a null on the end of literal strings. The rest is up to you, that’s one reason why it’s so easy to get segfault and undefined behaviour using C. Your code for declaring `arr2` and copying to it is a perfect example of how to do this wrongly. – DisappointedByUnaccountableMod Dec 08 '20 at 23:10
  • 1
    The best (and only) practice is to write code with defined behavior. There's no point in zero terminating something if it's never going to be used. You could `printf("%.*s", 5, arr2)` and be happy. – KamilCuk Dec 08 '20 at 23:10
  • @barry Ok, so is that the only instance in which the compiler zero terminates arrays for you? – joshpetit Dec 08 '20 at 23:11
  • I also do not get the question `What are the cases in which the C compiler does not provide an ending null value?`. Compiler translates code, from one language to another. Programmer writes the code so programmer, always, provides null value on the end of the string. – KamilCuk Dec 08 '20 at 23:14
  • @KamilCuk I meant in instances such as when you use a string literal, unless I am misunderstanding the transaction there. It appears to me that the compiler provides the ending zero value for you. – joshpetit Dec 08 '20 at 23:16
  • Undefined behavior is undefined. You can't reason about undefined behavior. – KamilCuk Dec 08 '20 at 23:17
  • @KamilCuk Yes I agree completely, thank you a lot for the pointers. – joshpetit Dec 08 '20 at 23:19
  • To see why `arr2` is wrong, try changing the declaration to `char arr1[5] = "hello", arr2[5], arr3[5] = "world";` – user3386109 Dec 08 '20 at 23:20
  • @user3386109 Yea I'm starting to see the way string literals are viewed in C. After running your examples a few dozen times I got the same output, but I can see and understand why the behavior is still unpredictable and wrong, thank you! I'm going to try and make sure I stick as close to the standards as possible as I learn, your help is much appreciated. – joshpetit Dec 08 '20 at 23:28
  • 1
    "This works perfectly fine". [Famous last words](https://godbolt.org/z/3r9cxW). – n. m. could be an AI Dec 08 '20 at 23:45
  • @n.'pronouns'm. Lesson learned – joshpetit Dec 09 '20 at 00:04

3 Answers3

1

(I'm not sure if this answers your question, but I think it does. If not, tell me to delete it, and l'll do so) The compiler only provides this when you are initializing a literal:

const char str[] = "Hello, world!"

Anytime else (for all intents and purposes I think) the compiler doesn't do this for you. Some functions (inside the standard library and outside) will add a \0 for you, but the best way to figure this out is to look at the documentation.

Isacc Barker
  • 507
  • 4
  • 15
1

Arrays are not generally terminated by '\0'

Strings, which are a specific kind of character array, are by definition terminated by '\0'. This means a C-style string is a "char array that is terminated by '\0'. This is true even if:

  • The first '\0' is encountered before the end of the allocated "char array"
  • The first '\0' is encountered as the last element of the allocated "char array"
  • The first '\0' is encountered after the last element of the allocated "char array"

Notice that last one, it causes all sorts of trouble. This means it is up to you as a developer to ensure that you have a '\0' at the end of any strings created while your program runs. A C-style string like "hello" will have its character array end in a '\0' because the compiler will create the backing array and insert that character at the end for you.

Mistakes in developing software include not ensuring that, as you change a string, all your strings end in '\0'. If you write a mistake like that into your code, there is a non-zero chance that some string operation (the non-safe ones) will just keep reading until it happens upon a '\0' or walks out of the program's assigned memory.

The first case can cause a security hole (called injection attack) in addition to incorrect program function. The second case will likely cause your program to terminate with a SEGFAULT message.

Edwin Buck
  • 69,361
  • 7
  • 100
  • 138
1

I know that arrays are terminated by a '\0' value,

No, strings are terminated by a zero byte (which can be expressed as '\0'). This is, in fact, among the defining characteristics of strings. But in C, strings are data, not a data structure. There is no "string" data type, and although the distinction is a bit fine, the array containing a string is not the same thing as that string itself. Among other things, it may be that the array is larger than the string contents + terminator. Arrays are not guaranteed to be terminated by any special value, except inasmuch as their contents may dictate (the array containing a string literal, for example).

and I was wondering what are the instances in which the compiler does not provide this?

The compiler is responsible for ensuring termination of string literals inside the arrays it automatically provides to contain them. It will also default-initialize otherwise uninitialized arrays declared at file scope or as static, which means initialization to zero for elements of arithmetic type, including character type. It will also default-initialize uninitialized members of any array, including automatic ones, when there is an explicit initializer (not an assignment!) covering only some members. The compiler has no other responsibility for providing zero termination.

For the most part, the string functions of the standard library depend on their inputs to be zero-terminated (that is, to be bona fide strings), and they provide zero-terminated output strings. But there are exceptions.

Similarly, for the most part, I/O functions that accept string inputs expect bona fide zero-terminated strings, and those that provide string outputs ensure zero-termination. But again, there are exceptions.

When it successfully allocates memory, the standard library function calloc() zero-initializes it. malloc() and realloc() do not do this.

That covers most of the standard library functions relevant to the question, I think, but this is something you need to pay attention to. Generally, though, function documentation will be explicit about any cases where the function accepts or produces unterminated "strings".

John Bollinger
  • 160,171
  • 8
  • 81
  • 157