4

I am currently working my way through "C++ Primer". In one the exercise questions it asks:

What does the following program do?

const char ca[] = { 'h', 'e', 'l', 'l', 'o' };

const char *cp = ca;

while (*cp)
{
    cout << *cp << endl;

    cp++;
}

I am quite happy that i understand *cp will be continue to be true past the last character of the ca[] array because there is no null-character as the last item in the array.

It is more for my own curiosity as to what makes the while-loop become false. It seems to always show 19 characters on my computer. 0-4 are the hello string, 5-11 are always the same, and 12-19 change with each execution.

#include <iostream>

using namespace std;

int main( )
{
    const char ca[ ] = { 'h', 'e', 'l', 'l', 'o'/*, '\0'*/ };

    const char *cp = ca;

    int count = 0;

    while ( *cp )
    {
        // {counter} {object-pointed-to} {number-equivalent}
        cout << count << "\t" << *cp << "\t" << (int)*cp << endl;

        count++;
        cp++;
    }

    return 0;
}

The question: What causes the while-loop to become invalid? Why is 5-11 always the same character?

Compton
  • 57
  • 3

4 Answers4

3

C++ allows you to make pointer to ca[0] through one past ca[4], inclusive. You can dereference (i.e. apply operator *) to pointers only to ca[0] through ca[4], though; the pointer pointing to one past ca[4] is off limits. What happens as soon as you dereference the pointer is undefined behavior. The program may produce any data, or even crash.

What happens in reality is simpler: a pointer is just an address into memory, so dereferencing it continues to deliver numbers to your program. At some point, the address contains a zero byte. This is when your program stops.

Your array is allocated in automatic memory. Most compilers use CPU stack for it. The content of bytes 5..20 very likely includes cp and count, because compilers tend to place local variables together. There is probably some padding in between, because pointer and int are usually aligned at addresses divisible by 4. Naturally, you cannot count on any of that happening, because other compilers will do it differently.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
2

Since for the language what happens by accessing an array out of it's bounds is undefined, if you want to understand what happens you have o understand how your "platform" works.

For the most of the compilers your memory is probably layed-out like this:

|H|e|l|l|o|XXX|____cp___|__count__|

XXX are "padding bytes" necessary to align to 8. Compilers -in debug version- typically fill these bytes with fixed values other than 0 just to have an out of bound iteration to don't stop (so that you can discover it)

cp is a pointer to the "H" that increments one by one. It's value is normally the address map of the stack of your process in your process itself.

This address usually have a fixed prefix, and an offset value that grows as you go deep in nested calls.

Since a pointer is (probably) 8 bytes long (with the last four bytes placed before the first, because of low endianes of x86 processors) what you get is an iteration that prints:

  • The five "Hello" characters
  • The three padding characters (admitting the yare somehow printable)
  • The cp offset from the stack beginning (always the same, since main is always in the same place respect to the program itself)
  • Part of the process prefix (this changes at every invocation)

This prefix may include a "0" at certain point on, thus terminating the loop.

Note that -however this explanation can make sense- you cannot in any way trust it for production code to be compiled for different platform, may be even by different compilers, since the way they manage variables can also be different.

Emilio Garavaglia
  • 20,229
  • 2
  • 46
  • 63
  • Thanks everyone for the reply. I was aware that going past the last element in the array led to undefined behaviour. In my mind, I was expecting it to continue until it found a '0'. I was surprised by the consistency of what came after 'Hello'. All part of the fun in learning a new language, especially coming from C#. Thanks Emilio (and others) for your helpful replies. Matt. – Compton Aug 13 '16 at 11:39
1

The important thing to know is that you are experiencing Undefined behavior. So whatever you see now, can be different when you use a different compiler or different compiler options.

The most explainable reason for the 'constant' values on 5-11 is that you are reading a part of the stack, which just happens to have the same value every time.

JVApen
  • 11,008
  • 5
  • 31
  • 67
0

For starters:

This: stores an array of characters (a modifiable string) including the \0 null terminator:

const char ca[] = "hello";

This doesn't includes the null terminator. It initializes the array from your initializer-list

const char ca[] = { 'h', 'e', 'l', 'l', 'o' };

Works the same way you will do:

const int ib[] = {2, 5, 7, 9};

Makes sense here, cause the compiler shouldn't add extra stuff to your array.


const char ca[] = { 'h', 'e', 'l', 'l', 'o' };
const char *cp = ca;
while (*cp){
    cout << *cp << endl;
    cp++;
}

Well, you have an Undefined behavior in your code because you will dereference past your array since there is no \0 null terminator in your array.

What causes the while-loop to become invalid?

So what happens after printing the last character in your array is that your program keeps reading and printing from an undefined (unknown) memory location until it got to a point where 0 was found and the loop exited.

Why is 5-11 always the same character?

As to why they looked the same: stack variables are arranged sort of linearly as the compiler wishes; memory is also padded; again, memory is reused, hence you read from the address of stuff, which you have no legal business with.


Footnote: Your program may call other functions before int main(). (it's not your business, what it calls). These functions initializes static variables which includes things like std::cout

Community
  • 1
  • 1
WhiZTiM
  • 21,207
  • 4
  • 43
  • 68