1

I have been working with strings in C. While working with ways to declare them and initialize them, I found some weird behavior I don't understand.

#include<stdio.h>
#include<string.h>

int main()
{
    char str[5] = "World";
    char str1[] = "hello";
    char str2[] = {'N','a','m','a','s','t','e'};
    char* str3 = "Hi";

    printf("%s %zu\n"
           "%s %zu\n"
           "%s %zu\n"
           "%s %zu\n",
           str, strlen(str),
           str1, strlen(str1),
           str2, strlen(str2),
           str3, strlen(str3));

    return 0;
}

Sample output:

Worldhello 10
hello 5
Namaste 7
Hi 2

In some cases, the above code makes str contain Worldhello, and the rest are as they were intialized. In some other cases, the above code makes str2 contain Namastehello. It happens with different variables I never concatenated. So, how are they are getting combined?

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
  • 6
    `"World"` requires at least 6 characters of space, 5 for the letters and 1 for the terminator. – Retired Ninja Sep 02 '22 at 01:53
  • 1
    You are misinterpreting things. The convention for strings is that their end is marked by a terminating null character, '\0'. Your str array is too short , so truncates off the end. And later printf rampages though memory until it finds something that looks like a null terminator. – Avi Berger Sep 02 '22 at 01:54
  • 1
    `strlen` returns a `size_t` (an unsigned integer type); you need to use the `%zu` conversion specifier when printing these values to avoid undefined behavior. – ad absurdum Sep 02 '22 at 01:56
  • 3
    Note that neither `str` nor `str2` is actually a string — they are merely byte arrays. That means that you cannot pass them to functions that expect to be passed strings — specifically, neither `printf()` nor `strlen()`. (With enough care, you could pass them to `printf()`, but the conversion specification would need to be more complex than `%s` — `%.*s` would do the job as long as you specified the length (as an `int`, not a `size_t`) before passing the byte array.) – Jonathan Leffler Sep 02 '22 at 02:28
  • The practical reason a C-string is defined as being terminated by the nul-character `'\0'` (or just plain `0`) is so all string functions and function expecting a C-string know where to stop processing characters. That's why there is no need to provide a length to `strcpy()`, but you must with `memcpy()` -- `strcpy()` knows where the string ends when it reaches the nul-character. That's why if you have the string (actually string-literal) `char *s = "Hello World";`, you can loop over each character with a simple loop `for (int i = 0; str[i]; i++) { /* do something with each char */ }`. – David C. Rankin Sep 02 '22 at 03:56
  • Does this answer your question? [C language scanf copies extra string](https://stackoverflow.com/questions/52709508/c-language-scanf-copies-extra-string) – autistic Sep 05 '22 at 05:13
  • Also possibly a duplicate of [How should character arrays be used as strings?](https://stackoverflow.com/questions/58526131/how-should-character-arrays-be-used-as-strings) – autistic Sep 09 '22 at 22:19

3 Answers3

3

To work with strings, you must allow space for a null character at the end of each string. Where you have char str[5]="World";, you allow only five characters, and the compiler fills them with “World”, but there is no space for a null character after them. Although the string literal "World" includes an automatic null character at its end, you did not provide space for it in the array, so it is not copied.

Where you have char str1[]="hello";, the compiler determines the array size by counting the characters, including the null character at the end of the string literal.

Where you have char str2[]={'N','a','m','a','s','t','e'};, there is no string literal, just a list of individual characters. The compiler determines the array size by counting those. Since there is no null character, it does not provide space for it.

One potential consequence of failing to terminate a string with a null character is that printf will continue reading memory beyond the string and printing characters from the values it finds. When the compiler has placed other character arrays after such an array you are printing, characters from those arrays may appear in the output.

If you allow space for a null character in str and provide a zero value in str2, your program will print strings in an orderly way:

#include <stdio.h>
#include <string.h>

int main(void)
{
    char str[6] = "World"; // 5 letters plus a null character.
    char str1[] = "hello";
    char str2[] = {'N', 'a', 'm', 'a', 's', 't', 'e',  0}; // Include a null.
    char *str3 = "Hi";
    printf("%s %zu\n%s %zu\n%s %zu\n%s %zu\n",
        str,  strlen(str),
        str1, strlen(str1),
        str2, strlen(str2),
        str3, strlen(str3));
    return 0;
}
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
0

Undefined behavior in non-null-terminated, adjacently-stored C-strings

Why do you get this part:

Worldhello 10
hello 5

...instead of this?

World 5
hello 5

The answer is that printf() prints chars until it hits a null character, which is a binary zero, frequently written as the '\0' char. And, the compiler happens to have placed the character array containing hello right after the character array containing World. Since you explicitly forced the size of str to be 5 via str[5], the compiler was unable to fit the automatic null character at the end of the string. So, with hello happening to be (not guaranteed to be) right after World, and printf() printing until it sees a binary zero, it printed World, saw no terminating null char, and continued right on into the hello string right after it. This resulted in it printing Worldhello, and then stopping only when it saw the terminating character after hello, which string is properly terminated.

This code relies on undefined behavior, which is a bug. It cannot be relied upon. But, that is the explanation for this case.

Run it with gcc on a 64-bit Linux machine online here: Online GDB: undefined behavior in NON null-terminated C strings

@Eric Postpischil has a great answer and provides more insight here.

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
-1

From the C tag wiki:

This tag should be used with general questions concerning the C language, as defined in the ISO 9899 standard (the latest version, 9899:2018, unless otherwise specified — also tag version-specific requests with c89, c99, c11, etc).

You've asked a "how?" question about something that none of those documents defines, and so the answer is undefined in the context of C. You can only experience this phenomenon through undefined behaviour.

how are they are getting combined?

There is no such requirement that any of these variables are "combined" or are immediately located after each other; trying to observe that is undefined behaviour. It may appear to coincidentally work (whatever that means) for you at times on your machine, while failing at other times or using some other machine or compiler, etc. That's purely coincidental and not to be relied upon.

In some cases, the above code assigns str with Worldhello and the rest as they were intitated.

In the context of undefined behaviour, it makes no sense to make claims about how your code functions, as you've already noticed, the functionality is erratic.

I found some weird Behaviour with them.

If you want to prevent erratic behaviour, stop invoking undefined behaviour by accessing arrays out of bounds (i.e. causing strlen to run off the end of an array).

Only one of those variables is safe to pass to strlen; you need to ensure the array contains a null terminator.

autistic
  • 1
  • 3
  • 35
  • 80
  • 1
    Undefined behavior is not necessarily erratic. It might be perfectly predictable 100% of the time for a given version of compiler and chunk of code. "Undefined" means simply that the language standard doesn't dictate what should happen, and therefore the behavior cannot be safely relied upon. So yes, undefined behavior can frequently still be very well explained. – Gabriel Staples Sep 02 '22 at 03:05
  • @GabrielStaples it certainly is erratic in the scope of the language. By mentioning any particular compiler you commit fallacy of authority; no compiler defines the C programming language... and I quote, "Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)." – autistic Sep 02 '22 at 03:09
  • 2
    The fact that the C standard does not define a behavior does not mean nothing else helps us understand what happens. It simply means this one thing, the C standard, is not applicable. When we have knowledge of other things, such as how compilers operate, how programs are laid out in memory, how `printf` is implemented, and general design and engineering principles, we can often find reasons for behaviors that are not explained by the C standard… – Eric Postpischil Sep 02 '22 at 03:10
  • @GabrielStaples care to show me where from the C standard it says that undefined behaviour "might be perfectly predictable"? Because to me it seems that quote above says the opposite, using the specific word... "unpredictable"... – autistic Sep 02 '22 at 03:11
  • 3
    When we see concatenated arrays upon asking `printf` to print an unterminated array, we know with great confidence it is a horse, not a zebra: It is caused by the compiler laying out the arrays consecutively in memory, not by `printf` printing random characters that just coincidentally happened to have the same values as the characters in the second array. – Eric Postpischil Sep 02 '22 at 03:11
  • @EricPostpischil so we should not answer a C-tagged question in the realms of C, and instead from the perspective of something that's outside of the realms of C? – autistic Sep 02 '22 at 03:12
  • 1
    @autistic: The C standard does not say undefined behavior is unpredictable. It says that possible undefined behavior includes unpredictable results, and it says that only in the context of describing what the standard **permits**, so it is not even a guarantee that unpredictable results are ever an actual possibility. – Eric Postpischil Sep 02 '22 at 03:13
  • 1
    @autistic: Yes. – Eric Postpischil Sep 02 '22 at 03:13
  • @EricPostpischil ... yet there is no such requirement in standard C, and you're using a compiler that doesn't define C as though it's an authority in the answer... This question isn't tagged gcc, clang or msvc. Stop tag spamming. – autistic Sep 02 '22 at 03:13
  • @autistic: No such requirement as what? I have not cited any compiler’s behavior as authoritative about the C standard. Entering comments is not tag spamming. – Eric Postpischil Sep 02 '22 at 03:14
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/247744/discussion-between-autistic-and-eric-postpischil). – autistic Sep 02 '22 at 03:16