21

For some reason, my second character array (var2) merges with the first one (var1). Here is my code:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main()     
{
  char var1[5] = "Hello";
  char var2[5] = "World";

  printf("This program can write:\t%s\t%s\n", var1, var2);
  getch();
  return 0;
}

after compiling it, I got the following print:

This program can write: Hello WorldHello

When I changed the code to printf("This program can write:\t%s\n", var2); I got the following print:

This program can write: WorldHello

So It's clear that that var1 is merging with var2.

Is this some kind of compiler bug. If so, how can I fix it? I tried reinstalling MINGW, but I'm still getting the same results.

Thanks a lot

OMAX
  • 221
  • 1
  • 5
  • 9
    Do not specify the array size. Let the compiler size it to 6 with `char var1[] = "Hello";` which includes the trailing null character. – chux - Reinstate Monica Dec 14 '15 at 16:13
  • 12
    Most compilers should have produced a warning about that, btw. – Dummy00001 Dec 14 '15 at 16:24
  • 1
    Your arrays need one more character added to the end to null-terminate them. printf will look for the end of the string using the NULL character, which the string "Hello" does not have. Try changing `var1` to be size 6 and change "Hello" to "Hello\0". Then do the same to `var2` – Gophyr Dec 14 '15 at 16:26
  • Side question: Are you guaranteed that var2 and var1 will be stored in consecutive memory? – sudo rm -rf slash Dec 14 '15 at 21:39
  • 3
    Rule 1 of compiler bugs: It's not actually the compiler ;) – LordAro Dec 14 '15 at 23:12
  • @JosephMalle Nope. It depends on whether the stack pointer increases (low to high memory) or decreases (high to low memory) as you push to the stack. See [this discussion](http://stackoverflow.com/questions/1677415/does-stack-grow-upward-or-downward). The behavior of the stack is actually quite important in answering this question (specifically why the strings get merged) and I'm surprised nobody has touched on it in their answer. – Graph Theory Dec 15 '15 at 01:37

3 Answers3

36

Strings are actually one-dimensional array of characters terminated by a null character '\0'. Thus a null-terminated string contains the characters that comprise the string followed by a null.

The following declaration and initialization create a string consisting of the word "Hello". To hold the null character at the end of the array, the size of the character array containing the string is one more than the number of characters in the word "Hello."

char var1[6] = {'H', 'e', 'l', 'l', 'o', '\0'};

You can more simply do :

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main()     
{

 char var1[] = "Hello";
 char var2[] = "World";

 printf("This program can write:\t%s\t%s\n", var1, var2);
 getchar();
 return 0;

}

The C compiler automatically places the '\0' at the end of the string when it initializes the array (if the array is long enough to contain the '\0', otherwise the '\0' will be dropped rather than overwrite another variable).

trinaldi
  • 2,872
  • 2
  • 32
  • 37
Mxsky
  • 749
  • 9
  • 27
  • 1
    could you add this to the end of the last sentence "... if the array is long enough to contain the '\0', otherwise the '\0' will be dropped rather than overwrite another variable". Thanks :) – ztk Dec 14 '15 at 16:17
  • 2
    Ohh Yeah, I totally forgot about the '\0'. It's been years since I last programmed C. Thanks a lot. – OMAX Dec 14 '15 at 16:38
  • 4
    Only answer containing `[]`, which I appreciate (but I am not allowed to say how I have shown that appreciation). – Carsten S Dec 14 '15 at 20:20
9

When using %s as format specifier in printf, it reads the character from memory and stops upon finding a '\0' character. If it doesn't find the '\0' character then reads until it find a '\0' character somewhere in the memory.

In the above snippet both of var1 and var2 are character arrays of length 5. Since you are using %s as a format specifier you need to terminate them with '\0'. This can be done by increasing the size of arrays. This will append a '\0' character by default

char var1[6] = "Hello";
char var2[6] = "World";   

See the difference between char var1[5] = "Hello"; and char var1[6] = "Hello";

+--------+--------+--------+--------+--------+
|        |        |        |        |        |  
|  'H'   |  'e'   |  'l'   |   'l'  |  'o'   |  char var1[5] = "Hello";
|        |        |        |        |        | 
+--------+--------+--------+--------+--------+





+--------+--------+--------+--------+--------+--------+
|        |        |        |        |        |        |
|  'H'   |  'e'   |  'l'   |   'l'  |  'o'   |   '\0' |  char var1[6] = "Hello";
|        |        |        |        |        |        |
+--------+--------+--------+--------+--------+--------+
haccks
  • 104,019
  • 25
  • 176
  • 264
8

You forget to include de \0 that tells the end of string, so updating your arrays size plus one will do the trick:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main()     
{
  char var1[6] = "Hello";
  char var2[6] = "World";

  printf("This program can write:\t%s\t%s\n", var1, var2);
  getchar();
  return 0;
}

This prints:

This program can write: Hello World

Netwave
  • 40,134
  • 6
  • 50
  • 93
  • Thanks, It really helped me. – OMAX Dec 14 '15 at 16:40
  • 3
    The only reason to ever specify the array size is to avoid getting the `\0` at the end or if you want to specify some larger buffer than the string would have (although since you can't write to that additional space later without implementation defined behavior it's not great either). But in this case it's just a particularly bad idea. – Voo Dec 14 '15 at 17:40
  • @Voo what do you think is I-D? If you declare the array size larger than needed by the initializer all remaining elements are initialized to zero. Even if they were uninitialized, writing would still be defined, and reading would be *Undefined Behavior* not I-D, – dave_thompson_085 Dec 15 '15 at 04:45
  • @dave Strings are const (despite the old flaw in c that asked you to sign them to a non const char*) so writing is generally UB. I'd think though that at least some compilers allow it with the right options, because it is quite common to do so. – Voo Dec 15 '15 at 06:24
  • (Although that still doesn't make it anything but UB so my mistake. Not sure if there any nomenclature for such a thing) – Voo Dec 15 '15 at 06:34
  • @Voo a string **literal** used as an lvalue is `const` in C++; this question is for C and in C it is not and never was `const`, but writing to it *is* UB. But a **variable** *initialized* from a string is *not* affected by this, and unless the *variable* is `const` (and this one isn't) writing to it is fine. – dave_thompson_085 Dec 16 '15 at 06:01
  • @dave The string literal isn't const in C, but writing to it is still UB. But yep you're right that since we assign it to a variable that doesn't concern us. – Voo Dec 16 '15 at 10:23