1

How is an array stored in memory in this program? What happened here? How to understand this behaviour in c?(is it undefine/unspecified/implementation behaviour).

#include <stdio.h>

int main()
{
char a[5] = "world";
char b[16] = "haii how are you";

printf("string1 %s\nstring2 %s\n", a, b);

return 0;
}

output:

user@toad:~$ gcc -Wall simple.c
user@toad:~$ ./a.out 
string1 world
string2 haii how are youworld
user@toad:~$ 

but it is work fine.

char a[5] = "world";
char b[17] = "haii how are you?"; 
sakthi
  • 675
  • 2
  • 10
  • 18
  • 6
    Undefined behavior because printf `%s` takes a string, but `a` and `b` aren't strings: They aren't `'\0'` terminated. – melpomene Aug 11 '16 at 08:26
  • 1
    Remember that size strings require is *number of characters + 1 for nul character*. Just remove the size from array definition and compiler calculates the correct size: `char a[] = "world"; // allocates 6 elements`. – user694733 Aug 11 '16 at 08:29
  • @LưuVĩnhPhúc Did you read the Title? ==>> `How is an array stored in memory in this program?` How does this make it DUP? Where asked the OP that? – Michi Aug 11 '16 at 08:52
  • 1
    @Michi the title does not reflect the actual problem OP shows in the question body – M.M Aug 11 '16 at 09:30

6 Answers6

3

Both of the string in the first snippet is not null terminated. They are just character arrays, not null terminated string literals.
printf with %s specifier expects a null terminated string as its argument. Passing wrong type of argument will invoke undefined behavior.

printf write the string to the standard output till it encounters a '\0' character. In case of absence of '\0' it will read past the array. Since a and b are not null terminated, it could be the case that after writing b to the terminal printf continues to search for '\0' and it founds it after the string a.

haccks
  • 104,019
  • 25
  • 176
  • 264
  • OPs Title says `How is an array stored in memory in this program?`. I'm agree that the code is not the same with the Title, but what would one Answer if the Title will be the same, but the code will bi [Like this](http://ideone.com/AXClWb)? Because somehow I believe that, this is what the OP was asking – Michi Aug 11 '16 at 08:47
  • @Michi; I didn't get your point. Could give some more information what you are asking about? – haccks Aug 11 '16 at 09:34
  • The OP needs in the Title something and ask in its Topic something else. is hard to know the real Question here :) – Michi Aug 11 '16 at 09:55
2

As per the C11 standard (6.7.9 Initialization).

    EXAMPLE 8
The declaration
char s[] = "abc", t[3] = "abc";

defines ‘‘plain’’ char array objects s and t whose elements are initialized with character string literals.

This declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };

The contents of the arrays are modifiable. On the other hand, the declaration
char *p = "abc";
defines p with type ‘‘pointer to char’’ and initializes it to point to an object with type ‘‘array of char’’
with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to
modify the contents of the array, the behavior is undefined.

As per this for you

char a[6] = "world";
char b[17] = "haii how are you";

At the end of the "world" and "haii how are you" the '\0' is not added. So while using printf it searches for '\0' and prints both 'a' and 'b'.

  • "At the end of the "world" and "haii how are you" the '\0' is not added. So while using printf it searches for '\0' and prints both 'a' and 'b'." -- That's not guaranteed, C does not make any promises that these arrays follow after another. – ljrk Aug 11 '16 at 09:48
  • Arrays declared within a function (and not declared with the 'static' keyword; this would make them a global) are going to be put on the stack. So it will be one above the other. – Lakshmanan G Aug 18 '16 at 12:51
  • The C standard has no notion of what a "stack" is. – ljrk Aug 18 '16 at 13:03
1

The code is not correct. Some compilers do not want to compile it:

> clang++ test.cxx 
test.cxx:5:14: error: initializer-string for char array is too long
        char a[5] = "world";
                    ^~~~~~~
1 error generated.

Maybe your compiler just ignores array size, assigning the address of string constant to it and keeping null character at the end.

Anton Malyshev
  • 8,686
  • 2
  • 27
  • 45
  • 2
    Why should not be correct? [Why should be this Wrong](http://ideone.com/AXClWb)? – Michi Aug 11 '16 at 08:49
  • Yep, for me this is a compiler-bug, at least if this also occurs with *clang* and not *clang++* -- I don't know the C++ spec but I'm pretty sure the code @Michi linked is correct C99 – ljrk Aug 11 '16 at 09:00
  • 2
    `C99 6.7.8/14: An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.` – Michi Aug 11 '16 at 09:04
  • 1
    No, it fails with clang++ only, clang compiles it perfectly. Seems to be one of the differences between C and C++? – Anton Malyshev Aug 11 '16 at 09:08
  • @AntonMalyshev Same in C11 6.7.9/14 `An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.` – Michi Aug 11 '16 at 09:12
  • @AntonMalyshev Yep, seems so. g++ also fails – ljrk Aug 11 '16 at 09:15
  • @Michi C99-standard is not the C++ standard: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf Section 8.5.2/2 – ljrk Aug 11 '16 at 09:26
  • Wrapped it all up in my answer, plus adding memory layout. – ljrk Aug 11 '16 at 09:35
  • @larkey True, but this is a `C` question and not `C++` one. – Michi Aug 11 '16 at 09:57
  • @Michi Yep, but I thought you meant it as a reply to Anton's comment on it veing a C/C++ difference :) – ljrk Aug 11 '16 at 10:11
1
char a[5] = "world";

World has 5 characters inside and if you initialize it, you must indicate that your string is ended by null character --> \0

If you define it like that

char a[6] = "world";

Then compiler put the null character at the end for you.

For your question, most compilers don't allow you define char a[5] = "world" but it seems that your memory is allocated like this

[ h ] [ a ] [ i ] [ i ] [ ] [ h ] [ o ] [ w ] [ ] [ a ] [ r ] [ e ] [ ] [ y ] [ o ] [ u ] [ w ] [ o ] [ r ] [ l ] [ d ] [ \0 ]

Then last point you must know that %s prints the character set until it reaches null character --> \0

Berkay92
  • 552
  • 6
  • 21
  • No, it's unlikely that there's a \0 just directly at the end -- in my case the layout is actually "world\x7f\0" -- \7f is just not printed. – ljrk Aug 11 '16 at 09:47
1

How is an array stored in memory in this program?

First notice that your program is undefined behavior as you call printf with a char array and not a string since there isn't room for the zero-termination in the two arrays. For instance you only reserve 5 chars for a and world takes up all 5, i.e. no room for the termination.

A strict person would say that due to UB it makes no sense to speculate about what is going on - with UB anything can happen.

But if we do it anyway then it is likely as described below.

The answer would depend on your system as c doesn't specify all aspects of storing data. It is specified that an array must be in contiguous memory but exactly how and where that memory is located, is beyond the standard.

From the output you have, it seems that your system have located it like this:

haii how are youworld
^               ^
b               a

You can't know what is after the last d.

However, when you print a you get the output world which tells us that "by luck" there is a '\0' just after the last d.

haii how are youworld'\0'
^               ^      ^
b               a      "luck"

So printing a will give world and printing b will give haii how are youworld.

Your code should be:

char a[6] = "world";
char b[17] = "haii how are you";

to make room for the termination of each string and so that your memory layout would be

haii how are you'\0'world'\0'
^                   ^      
b                   a

Notice: The '\0' that you got "by luck" is probably because your system initializes all memory assigned to your program to zero at start up.

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • No, it's unlikely that there's a `\0` just directly at the end -- in my case the layout is actually `"world\x7f\0"` -- `\7f` is just not printed. – ljrk Aug 11 '16 at 09:44
  • @larkey - It is indeed likely as many systems zero initializes at start up. But it isn't something you can count on. Your system may not do it that way. Others may. As I wrote: It's system dependent. Also you can't trust that 0x7f isn't printed - on my system it would be. – Support Ukraine Aug 11 '16 at 11:14
  • But you assume that the `'\0'` got there by luck -- while it may just have been eg. a "\7f\0". It's just that you do simply not notice this. It's likely that it works but unlikely that the byte just after the `'d'` is a `'\0'`. I tried on all my machines and I often got 1/2 bytes of garbage (that's not printed though) followed by the terminating '\0' byte. – ljrk Aug 11 '16 at 11:17
1

As already mentioned by others, while the array declaration itself is completely conformant (you declare an array of chars, not a string):

An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

(Section 6.7.8/14 of C99 standard n1256.pdf; thanks to @Michi for pointing out the paragraph)

Trying to print these using %s is undefined though. However you can specify the length of characters to print in the format string (%.5s) -- this way you'd be ok again.


Concerning the memory layout: C does not make many promises about how the memory is laid out actually. The only thing I can find is:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. 36) Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T, the array type is sometimes called ‘‘array of T’’. The construction of an array type from an element type is called ‘‘array type derivation’’.

(Section 6.2.5/20 of C99 standard n1256.pdf)


Note however, that in C++ even the code

char test[5] = "12345";

is illegal:

There shall not be more initializers than there are array elements. [ Example:

char cv[4] = "asdf";    // error

is ill-formed since there is no space for the implied trailing ’\0’. — end example ]

(Section 8.5.2/2 of C++ 14 standard n4296.pdf)

ljrk
  • 751
  • 1
  • 5
  • 21