4

I was playing around with c strings in c++ and found some behavior I don't understand when I don't terminate a char array.

char strA[2] = {'a','\0'};
char strB[1] = {'b'};
cout << strA << strB;

I would expect this to print ab, but instead it prints aba. If I instead declare strB before strA, it works as expected. Could someone explain what's going on here?

conrad12345
  • 61
  • 1
  • 1
  • 3
  • 1
    You're at the mercy of how things are laid out on the stack by the compiler. The `strA` is beling placed later in memory (i.e. earlier in the stack) than `strB`. The strings are adjacent in memory, so when you print the non-terminated `strB` it continues on, printing `strA` as well before hitting the `\0` in `strA`. – Jonesinator Sep 30 '17 at 22:38
  • 6
    Undefined behavior. – Ron Sep 30 '17 at 22:49
  • You are playing with char arrays, not with C strings. –  Sep 30 '17 at 23:02
  • 1
    When outputting an array of char (which it receives as a pointer), `cout`s `operator<<()` - like most C I/O functions - ASSUMES there is a terminating zero present, and keeps going until it finds it. If there is no terminator, the result is undefined behaviour - anything can happen. – Peter Sep 30 '17 at 23:59
  • Possible duplicate of [What happened when we do not include '\0' at the end of string in C?](https://stackoverflow.com/questions/34995106/what-happened-when-we-do-not-include-0-at-the-end-of-string-in-c) – phuclv Sep 14 '18 at 08:59
  • [why printf works on non-terminated string](https://stackoverflow.com/q/4999901/995714), [String is longer than expected in C](https://stackoverflow.com/q/33707486/995714) – phuclv Sep 14 '18 at 09:00

2 Answers2

9

This is undefined behaviour and you simply are lucky that replacing the declaration of these 2 arrays works for you. Let's see what is happening in your code:

char strA[2] = {'a','\0'};

Creates an array that can be treated like a string - it is null terminated.

char strB[1] = {'b'};

Creates an array that cannot be treated like a string, because it lacks the null terminating character '\0'.


std::cout << strA << strB;

The first part, being << strA, works fine. It prints a since strA is treated as a const char*, which provided as an argument for std::ostream& operator << will be used to print every character untill the null terminating character is encountered.

What happens then? Then, the << strB is being executed (actually what happens here is a little different and more complicated than simply dividing this line into two, separate std::cout << calls, but it does not matter here). It is also treated as a const char*, which is expected to ended with mentioned '\0', however it is not...

What does that lead to? You are lucky enough that there randomly is only 1 character before (again - random) '\0' in memory, which stops the (possibly near-infinite) printing process.


Why, if I instead declare strB before strA, it works as expected?

That is because you were lucky enough that the compiler decided to declare your strA just after the strB, thus, when printing the strB, it prints everything that it consists + prints strA, which ends with null terminating character. This is not guaranteed. Avoid using char[] to represent and print strings. Use std::string instead, which takes care of the null terminating character for you.

Fureeish
  • 12,533
  • 4
  • 32
  • 62
0

When printing char arrays, the C (and C++) convention is to print all bytes until a '\0'.

Because of how the local variables are organized, strB's memory is behind strA's, so when printing strB the printing just 'overflows' and keeps printing strA until the terminating '\0'.

I guess when the deceleration is reversed, the printing of strB is terminated by a 0 that is just there because nothing else was set there, but you shouldn't rely on that - this is called a garbage value.

Don't use unterminated C-strings, at all. Also avoid C-strings in general, you can use C++ std::string which are much more secure and fun.

When I run this code on my computer, I have a bunch (exactly seven) of weird chars printed between the ab to the a, which are probably whatever was between strA's and strB's memory spaces.
When I reverse the declarations, I get ab$%^& where $%^& are a bunch of weird chars - the ones between the end of strB's memory to the next random \0.

Neo
  • 3,534
  • 2
  • 20
  • 32