3

I'm reading the C++ Primer Plus by Stephen Prata. He gives this example:

char dog[8] = { 'b', 'e', 'a', 'u', 'x', ' ', 'I', 'I'}; // not a string!
char cat[8] = {'f', 'a', 't', 'e', 's', 's', 'a', '\0'}; // a string!

with the comment that:

Both of these arrays are arrays of char, but only the second is a string.The null character plays a fundamental role in C-style strings. For example, C++ has many functions that handle strings, including those used by cout.They all work by processing a string character- by-character until they reach the null character. If you ask cout to display a nice string like cat in the preceding example, it displays the first seven characters, detects the null character, and stops. But if you are ungracious enough to tell cout to display the dog array from the preceding example, which is not a string, cout prints the eight letters in the array and then keeps marching through memory byte-by-byte, interpreting each byte as a character to print, until it reaches a null character. Because null characters, which really are bytes set to zero, tend to be common in memory, the damage is usually contained quickly; nonetheless, you should not treat nonstring character arrays as strings.

Now, if a declare my variables global, like this:

#include <iostream>
using namespace std;

char a[8] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'};
char b[8] = {'1', '2', '3', '4', '5', '6', '7', '8'};

int main(void)
{
    cout << a << endl;
    cout << b << endl;

    return 0;
}

the output will be:

abcdefgh12345678
12345678

So, indeed, the cout "keeps marching through memory byte-by-byte" but only to the end of the second character array. The same thing happens with any combination of char array. I'm thinking that all the other addresses are initialized to 0 and that's why the cout stop. Is this true? If I do something like:

for (int i = 0; i < 100; ++i)
{
    cout << *(&a + i) << endl;
}

I'm getting mostly empty space at output (like 95%, perhaps), but not everywhere.

If, however, i declare my char arrays a little bit shorter, like:

char a[3] = {'a', 'b', 'c'};
char b[3] = {'1', '2', '3'};

keeping all other things the same, I'm getting the following output:

abc
123

Now the cout doesn't even get past the first char array, not to mention the second. Why is this happening? I've checked the memory addresses and they are sequential, just like in the first scenario. For example,

cout << &a << endl;
cout << &b << endl;

gives

003B903C
003B9040

Why is the behavior different in this case? Why doesn't it read beyond the first char array?

And, lastly if I do declare my variables inside main, then I do get the behavior suggested by Prata, namely, a lot of junk gets printed before, somewhere a null character is reached.

I'm guessing that in the first case, the char array is declared on the heap and that this is initialized to 0 (but not everywhere, why?) and cout behaves differently based on the length of the char array (why?)

I'm using Visual Studio 2010 for these examples.

Deanie
  • 2,316
  • 2
  • 19
  • 35
mihai
  • 4,592
  • 3
  • 29
  • 42
  • 5
    It's undefined behaviour. You're overthinking it. Also, I haven't heard great things about that book. – chris Oct 14 '13 at 05:50
  • Yes, I'm not a fan of the book either, but the Stroustrup edition seems more like a reference to me and I haven't really heard of any other (good) alternatives. – mihai Oct 14 '13 at 06:21
  • 2
    Perhaps you can take a look at [SO's list](http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list). – chris Oct 14 '13 at 06:24

4 Answers4

6

It looks like your C++ compiler is allocating space in 4-byte chunks, so that every object has an address that is a multiple of 4 (the hex addresses in your dump are divisible by 4). Compilers like to do this because they like to make sure larger datatypes such as intand float (4 bytes wide) are aligned to 4-byte boundaries. Compilers like to do this because some kinds of computer hardware take longer to load/move/store unaligned int and float values.

In your first example, each array need 8 bytes of memory - a char fills a single byte - so the compiler allocates exactly 8 bytes. In the second example each array is 3 bytes, so the compiler allocates 4 bytes, fills the first 3 bytes with your data, and leaves the 4th byte unused.

Now in this second case it appears the unused byte was filled with a null which explains why cout stopped at the end of the string. But as others have pointed out, you cannot depend on unused bytes to be initialized to any particular value, so the behaviour of the program cannot be guaranteed.

If you change your sample arrays to have 4 bytes the program will behave as in the first example.

Peter Raynham
  • 647
  • 3
  • 6
  • Yes, you're right. It does behave like in the first example after adding one more element to the char array. I'm still not sure why the memory (the heap?!) is initialized to all 0s in the first case, but not in the second (the stack?!). – mihai Oct 14 '13 at 06:14
  • 2
    And probably nobody besides guys that wrote the VS compiler isn't. Because its a waste of time to dig into something that is undefined. – tomi.lee.jones Oct 14 '13 at 07:01
  • Does `printf("%s",a);` act the same way (marching through memory byte-by-byte until it reaches a null character)? – user2513149 May 17 '20 at 19:14
5

The contents of memory out of bounds is indeterminate. Accessing memory you do not own, even just for reading, leads to undefined behavior.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
1

Its an undefined behaviour, you cannot say what can happen.

Try on some other system you may get different output.

The answer to your question is that it is an Undefined Behaviour and its output cannot be explained.

In addition to above explanantion, in your particular case, you have declared array globally. Therefore in your second example a \0 is appended in the fourth byte of four-byte boundary as explained by Peter Raynham.

Community
  • 1
  • 1
0xF1
  • 6,046
  • 2
  • 27
  • 50
1

The '\0' is just a solution to tell how long is a string. Lets say you know how long it is by storing a value before the string.

But your case is when you intentionally leave it out the functions and normally your code as well will keep searching for the delimiter ( which is a null character ). It is undefined what is behind the bounds of a specified memory it greatly varies. In Mingw in debug mode with gdb its usually zeroed out, without gdb its just junk... altho this is just my experience. For the locally declared variables they are usually on the stack so what you are reading, is probably your call stack.

DrakkLord
  • 645
  • 4
  • 8