6

As I worked through the Lippman C++ Primer (5th ed, C++11), I came across this code:

char ca[] = {'C', '+', '+'};  //not null terminated
cout << strlen(ca) << endl;  //disaster: ca isn't null terminated

Calling the library strlen function on ca, which is not null-terminated, results in undefined behavior. Lippman et al say that "the most likely effect of this call is that strlen will keep looking through the memory that follows ca until it encounters a null character."

A later exercise asks what the following code does:

const char ca[] = {'h','e','l','l','o'};
const char *cp = ca;
while (*cp) {
   cout << *cp << endl;
   ++cp;
}

My analysis: ca is a char array that is not null-terminated. cp, a pointer to char, initially holds the address of ca[0]. The condition of the while loop dereferences pointer cp, contextually converts the resulting char value to bool, and executes the loop block only if the conversion results in 'true.' Since any non-null char converts to a bool value of 'true,' the loop block executes, incrementing the pointer by the size of a char. The loop then steps through memory, printing each char until a null character is reached. Since ca is not null-terminated, the loop may continue well past the address of ca[4], interpreting the contents of later memory addresses as chars and writing their values to cout, until it happens to come across a chunk of bits that happen to represent the null character (all 0's). This behavior would be similar to what Lippman et al suggested that strlen(ca) does in the earlier example.

However, when I actually execute the code (again compiling with g++ -std=c++11), the program consistently prints:

'h'
'e'
'l'
'l'
'o'

and terminates. Why?

ivme
  • 548
  • 5
  • 14
  • 4
    Hint: the behaviour you're seeing perfectly matches your explanation of what you expect to happen. BTW, the use of `cout` clearly isn't C, removed that tag. –  Jun 21 '16 at 22:00
  • 7
    Because you are lucky? – Frank Puffer Jun 21 '16 at 22:01
  • 1
    And "undefined behavior" is, well, undefined. – Andrew Henle Jun 21 '16 at 22:03
  • 4
    @FrankPuffer, actually, He is **very unlucky**... because it didn't crash quickly enough for him to find his bug... UB is a terrible stuff – WhiZTiM Jun 21 '16 at 22:03
  • I call that unlucky. In a non-trivial piece of code with real importance to life or financial well-being, a crash would be a lot luckier than the broken program continuing to run. – user4581301 Jun 21 '16 at 22:04
  • @hvd I actually expected that a number of arbitrary characters would be printed following the 5 characters in the array. I was surprised that the contents of memory off the end of the end was consistently zero, but MikeMB's answer clarifies. – ivme Jun 21 '16 at 22:17
  • @AndrewHenle is the behavior in second code snippet actually undefined? I assumed that the behavior in the first code snippet was undefined because of something in the implementation of std::strlen. But it isn't clear to me that the behavior of the second code snippet is actually undefined. Shouldn't it behave exactly as I have described? – ivme Jun 21 '16 at 22:24
  • 1
    @Chad: The behavior is undefined, because you (or rather the << operator) is dereferencing a pointer to a memory location outside of the array. – MikeMB Jun 21 '16 at 22:36

2 Answers2

4

Most likely explanation: On modern desktop/server operating systems like windows and linux, memory is zeroed out before it is mapped into the address space of a program. So as long as the program doesn't use the adjacent memory locations for something else, it will look like a null terminated string. In your case, the adjacent bytes are probably just padding, as most variables are at least 4-Byte aligned.

As far as the language is concerned this is just one possible realization of undefined behavior.

MikeMB
  • 20,029
  • 9
  • 57
  • 102
  • if memory is zeroed out, then why are stack variables almost always garbage, and locals of functions, garbage value *even on first call*? – WhiZTiM Jun 21 '16 at 22:17
  • @WhiZTiM: Probably because that address space was already used in previous function calls. – MikeMB Jun 21 '16 at 22:20
  • @WhiZTiM: I'm on my mobile and don't have the time, to do a thourough search for proper citications, but here is one: http://stackoverflow.com/questions/6004816/kernel-zeroes-memory and if you google a bit, you'll find plenty of other sources. This is (I believe) mainly a security feature, to prevent one process from reading data that was freed from another process. – MikeMB Jun 21 '16 at 22:22
  • Thanks for the citation. In response to your response on my last question... *even on first call*? I think why stack variables have garbage values *even on first call*::: One reason could be that because stack variables could be mapped to a CPU register directly and that is definitely not zeroed by the OS. Reading it, simply reads off whatever is in the register. And recall, the OS saves the states of registers for each Process. – WhiZTiM Jun 21 '16 at 22:43
  • Nonetheless, - the padding assertion for the OP's question makes some sense.+1 – WhiZTiM Jun 21 '16 at 22:44
  • @WhiZTiM: A lot of functions are called even before your program enters main. – MikeMB Jun 21 '16 at 22:59
0

Are list-initialized char arrays still null-terminated?

There is no implicit null-terminator.

A list-initialized char array contains a null-terminated string, if at least one of the characters is initialized with the null-terminator.

If none of the characters are the null-terminator, then the array does not contain a null-terminated string.

the program consistently prints ... and terminates. Why?

You analyzed that the array would be accessed out of bounds. Your analysis is correct. You should also know that accessing an array out of bounds has undefined behaviour. So, the answer to why does it behave like this is: Because the behaviour is undefined.

As I already mentioned, your analysis is correct. Only your (implied) assumption that when the memory is accessed out of bounds, the first value must be a non-zero value. That assumption is wrong, because it is not guaranteed.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • "*They are null-terminated, if at least one of the character is initialized with the null-terminator.*" That's not what null-termination means. – Nicol Bolas Jun 21 '16 at 22:58
  • @NicolBolas thanks for the review. I hope that the wording is more to your liking now. – eerorika Jun 21 '16 at 23:03