1

I'm quite confused about how the local variables are ordered on the stack. I understand, that (on Intel x86) the local variables are stored from higher to lower address as they go in the code. So it's clear, that this code:

int i = 0;
char buffer[4];
strcpy(buffer, "aaaaaaaaaaaaaaa");
printf("%d", i);

produces something like this:

1633771873

The i variable was overwritten by the overflowed buffer.

However, if I swap the first two lines:

char buffer[4];
int i = 0;
strcpy(buffer, "aaaaaaaaaaaaaaa");
printf("%d", i);

the output is absolutely same.

How is it possible? The i's address is lower than the buffer's one and so an overflow of the buffer should overwrite other data, but not i. Or am I missing something?

MaKri
  • 190
  • 1
  • 8
  • The order of the source probably has nothing to do with the allocation order after compilation. Variables names and information are put into a hash table than then iterated when object code is written. However, they probably use alpha order -- try swaping the variable names not locations. – Hogan Aug 28 '15 at 17:42
  • 7
    C does not guarantee how compiler will arrange your variables. It can even optimize them away. Anyway, don't expect a defined results from undefined behaviour. – Eugene Sh. Aug 28 '15 at 17:43
  • 1
    You should look at the produced Assembly to find out the allocation order of local variables. – cadaniluk Aug 28 '15 at 17:43
  • 1
    There is no guarantee that `i` will even be allocated stack storage, it may be stored exclusively in a CPU register (or discarded entirely by the optimizer if it is not used.) – TeasingDart Aug 28 '15 at 17:46
  • 1
    @cad: Very good suggestion. You can learn a lot about how the compiler "thinks" by comparing the C source code to the assembly output. You can really see how drastically the compiler changes the code with the optimizer enabled versus disabled. – TeasingDart Aug 28 '15 at 17:48
  • There is absolutely no use in complaining _undefined behaviour_ behaves undefined! – too honest for this site Aug 28 '15 at 18:01
  • @TeasingDart: You can learn much more is you read articles about compiler optimization and the source code of OSS-compilers like llvm and gcc. – too honest for this site Aug 28 '15 at 18:04

2 Answers2

2

There is no rule about the order of local variables, so the compiler is generally free to allocate them the way it likes. But on the other hand there are many strategies that a compiler will use to reduce the possibility that could happen what you are voluntarily trying to do.

One of those safety enhancement would be to allocate a buffer always far from other scalar variables because an array can be addressed out of bounds and be more incline to bloat adjacent variables. Another trick is to add some trap empty space after arrays to create a kind of isolation for the bounds problem.

Anyway you can use the debugger to have a look to the assembly for confirmation of variables positioning.

Frankie_C
  • 4,764
  • 1
  • 13
  • 30
0

If you want to look at how the local variables are allocated by the compiler try compiling with gcc -S which will output the assembly code. On the assembly code you can see how the compiler has chosen to order the variables.

One thing to keep in mind in how the compiler chooses to order local variables is that each char only needs to align by 1 (which means that it can start at any byte of memory), on the other hand the int has to align by 4 (which means that it can only start on a byte evenly divisible by 4), so depending on the alignment the compiler has it's own logic on how to avoid having empty bytes of data which means that it often groups together variables of similar type in a certain order. So even if you define them like this:

int a;
char c;
int b;
char d;

It is likely that the compiler has grouped together the ints and chars in memory so the memory going from low memory on top to high memory on bottom might look something like:

 low memory
  |    |      | char d | char c|
  |           int b            |
  |           int a            |
  high memory

each block of || represents one byte and an entire line represents 4 bytes.

Try messing around with the assembly code sometime it is pretty interesting.

Yashwanth
  • 95
  • 5