0

I have the following code that shows the dangers of using char arrays over strings:

int main(){
    char password[] = "SECRET";
    char msg[10], ch;
    int i = 0;

    cout << "Please enter your name:";
    while((ch = getchar()) != '\n'){
            msg[i++] = ch;
    }
    msg[i] = '\0';

    cout << "\n\nHello " << msg << endl;
    cout << "The password is " << password;
}

When I enter a name (stored in char msg[10]) that is longer than 16 characters, everything after those 16 characters replaces the value stored in char password[] ("SECRET").

  1. Why is this the case? (a general curiosity)
  2. Why 16 characters and not 10 - the size of the array?
  3. Why is it always password that gets overwritten and not some other variable or some other part of the memory where I wouldn't notice immediately?
  4. What's the benefit of using char[] over strings then?

EDIT: Updated with follow up questions:
5. In response to the argument that password and msg are declared next to each other, I shuffled the declaration block as follows:

char password[] = "SECRET";
char ch;
int i = 0;
char msg[10];

However, no change.
6. In response to the argument that it was chance that caused the gap between msg and password to be 6 (bytes?) long, I have recompiled the code many times, including the reshuffling above. Still, no change.

Any suggestions as to why?

Islay
  • 478
  • 4
  • 17
  • @DrakaSAN: removed tag, thanks. – Islay May 20 '14 at 10:04
  • 2
    "What's the benefit of using `char[]` over `std::string`"? - nothing. – M.M May 20 '14 at 10:05
  • 1
    "Because undefined behavior". But that's not very satisfying. "Because the variables are located next to each other in memory, both being `char` arrays declared next to each other, that's not very surprising. – unwind May 20 '14 at 10:06

4 Answers4

4

The answer for your first three questions is the same: because that's how your compiler chose to lay out these variables on the stack. Nothing in the standard guarantees that - in fact, what you're doing is undefined behavior - anything could happen.

Change compilers, or even compiler settings, and other things might happen. Or not. There's no telling.

As for 4, except for interoperability with C code, or other APIs that require C-style strings, essentially none.

Mat
  • 202,337
  • 40
  • 393
  • 406
  • Thanks, makes sense. I've updated my question to include questions 5 and 6. Do you have any comments in response? – Islay May 20 '14 at 10:26
  • 1
    Same as 1, 2 and 3. The compiler lays out the stack however it wants. Also if you're not actually using the other variables, the compiler is free to completely remove them. (Unless their constructors have side-effects - not the case here.) – Mat May 20 '14 at 10:27
  • Wow, I feel like I have a lot less control than I was led into thinking. – Islay May 20 '14 at 10:31
  • 1
    Stack layout is pretty much out of your control indeed. As for what the compiler does with your code, the ["as if rule"](http://stackoverflow.com/questions/15718262/what-exactly-is-the-as-if-rule) might be a bit scary if you think "full control". (But don't worry, it's all for your own good. Just don't summon Undefined Behavior and you'll be safe.) – Mat May 20 '14 at 10:39
2

Your two arrays msg and password are static, and therefore have been placed on the stack, meaning they're near each other.

The specifics are implementation dependent and are likely to change between compilers and optimisation levels. It's possible that the compiler has padded the stack a bit when allocating memory and there is a 16 byte gap between msg[0] and password[0].

password gets overwritten everytime because it just happens to be above msg on your stack. If you used a different compiler, or swapped their positions around in code, it might not be. How things are allocated on the stack isn't going to change between executions; it's determined at compile time (it's static), not runtime.

Note that, in principle, the compiler is free to do anything it wants! We can only make educated guesses about what'll happen given typical compiler behaviour. If you really want to know what's going on, you have to look at the ouput assembly.

std::string (for C++) is usually preferable to char[] - it's far safer as it implements bound checking and manages its own memory.

xen-0
  • 709
  • 4
  • 12
  • Thanks. I edited my question to include follow-up: despite changing their position in code, the two variables still seem to be right next to each other. Is this unpredictability expected? – Islay May 20 '14 at 10:27
  • I'm not surprised. As I said, it _might_ have made a difference, but it evidently it didn't. A compiler only makes promises about the observed behaviour (output) of a well-formed program; it makes no promises given a nonsensical (undefined) one. – xen-0 May 21 '14 at 09:14
2

1 . In your case, memory is stored like that:

 msg                |   |i      |password
| | | | | | | | | | |1|2|3|4|5|6|S|E|C|R|E|T|\0

Then you write on msg progressively:

 msg                |           |password
|A|Z|E|R|T|Y|U|I|O|P|1|2|3|4|5|6|S|E|C|R|E|T|\0

But if you continue:

 msg                |           |password
|A|Z|E|R|T|Y|U|I|O|P|1|2|3|4|5|6|Q|W|E|R|T|Y|

Because char array doesn t check for length. (Search for overflow).

2 .You write on memory, you erase everything in between, maybe i or something that doesn t belong to your program.

3 .So it take 6 char before you overwrite password. It could have been 0char as well as millions.

4 .Unless you store a defined array of byte... Nothing, that is the point that code prove.

UPDATE:

  1. Changing the place of code won t change padding, add variable, array, or better: use a different compiler, so that even after optimisation, the binary change.

  2. Recompiling will not change the binary produced, because the compiler wil do the exact same thing.

DrakaSAN
  • 7,673
  • 7
  • 52
  • 94
  • 1
    to be exact the `int i` lies at 3-6 since it is placed at dividable-by-four-address. 1 and 2 are padding bytes. Also do not forget the Terminating `'\0'`. Memory may more look like `|A|Z|E|R|T|Y|U|I|O|P| | |1|2|3|4|Q|W|E|R|T|Y|0|`. One more tip: if you place a space between your 1 and the dot you can work around the formatting bug in your post. – vlad_tepesch May 20 '14 at 10:17
  • want a tribble-thanks ;). I made an edit to the comments just as you changed the post. – vlad_tepesch May 20 '14 at 10:25
  • Thanks. I edited my question to include follow-up: despite changing their position in code, the two variables still seem to be right next to each other. Is this unpredictability expected? Would `i` still be in between? – Islay May 20 '14 at 10:33
  • @Abhi: I ve edited to answer your update. Padding is generated by the compiler, which optimise your code before compiling. In fact, i may or may not be at this place. If you want to be sure to move it, you should try a different compiler, or desactive all optimisation. – DrakaSAN May 20 '14 at 10:35
  • optimization should have no effect on padding but may have effect on the existence of stack variables (may be kept completely in CPU-registers) – vlad_tepesch May 20 '14 at 10:39
  • 2
    @Abhi it is not specified how the compiler should lay out stack variables. we concluded from the described effects and the code to the memory layout. however accessing array elemets out of bounds undefined behavior. your compiler may cause your computer to catch fire or cause to summon demons and it will still be compliant to the c++-Standard. – vlad_tepesch May 20 '14 at 10:44
  • Hah, that might classify as 'observable behavior' and hence break the 'as-if' rule (mentioned by Mat in a comment below). Thanks for the explanations. – Islay May 20 '14 at 10:46
1

1) writing outside an array will access something else.
2) alignment probably.
3) chance. anything can happen.
4) nothing!

sp2danny
  • 7,488
  • 3
  • 31
  • 53