-3

The following snippet is intentionally accessing the next sizeof(int) bytes following t[4], so I am aware of the mistake that is being made here. I am just doing this as an experiment to see how the compiler handles the stack allocations.

int t[5], i;

for (i = 0; i <= 5; i++) {
   t[i] = 0;
}

When executing this code on Windows, using a ported version of the GNU C Compiler, the program always gets stuck in an infinite loop. I am sure that this could only happen because t and i are allocated sequentially on the stack one after the other and t[5] points to to the same address as the i variable. Therefore, when executing t[5] = 0 the program actually sets the value of i to zero.

However, when compiling this with a different version of the GNU C Compiler, I never get the infinite loop. The address of t[5] is not the same as the address of i.

My question is, why this different behavior? I know you should not assume anything about the outcome of this, but is it not the case that stack allocations should happen in the same way?

What I am really curious about is how does the compiler manage those stack allocations. Is there any padding? Is the order always the same as in the source code? Obviously this has nothing to do with the C standard and there are differences between implementations, or even different versions of the same compiler. I am curious though what are the possible outcomes and considerations in this particular case.

Alexandru Pele
  • 1,123
  • 7
  • 12
  • 3
    There is a difference between `C` and `C++`. Please do not tag the question with both. – Box Box Box Box Feb 25 '16 at 08:55
  • 9
    "I think that the the stack allocations should happen in the same way." - why do you think this? This is entirely an implementation detail, not in any way defined by the language. – BoBTFish Feb 25 '16 at 08:55
  • 3
    Why do you think the cause of the endless loop is the assignment modifying `i`? It can just as well be the compiler knowing that `i <= 5` can never be true and optimizing the comparison out. – David Schwartz Feb 25 '16 at 09:01
  • 1
    You're using different compilers on different platforms, thus the different behaviour – Shark Feb 25 '16 at 09:02
  • 5
    Undefined behavior = anything can happen. You can tell that anything does indeed happen in both cases, so the behavior is as expected (there is no expected behavior). Investigating why undefined behavior caused a particular result on a particular system, at a particular point in time, is simply not meaningful practice. – Lundin Feb 25 '16 at 09:14
  • @Lundin as I've mentioned in the post, on Windows the behavior is always the same. I think it's safe to assume that in this case, those allocations are were made sequentially without no padding. I agree that this might happen only in a particular case. This would've been a good answer. – Alexandru Pele Feb 25 '16 at 09:21
  • 1
    @AlexandruPele, well then you're lucky. As lundin mentioned, undefined behavior means anything can happen, and one of the cases are that the behavior remains the same for you, but it might be different when you run it another time. CANNOT PREDICT what will happen with UB. – Box Box Box Box Feb 25 '16 at 09:24
  • We need a 'seeks to have UB explained' canonical:( – Martin James Feb 25 '16 at 10:14
  • @AlexandruPele "Seemingly always the same result" is one of many forms of "anything can happen". It is not at all safe to assume that the allocation is done in a certain way, unless you can prove it by reading the compiler documentation and find out that the compiler makes guarantees beyond the scope of the C standard. In which case the behavior may be deterministic, for that particular compiler on that particular system. And you would need to document with comments that your program relies on such non-standard behavior. The code would still remain non-portable. – Lundin Feb 25 '16 at 10:58
  • @Lundin totally agree with you – Alexandru Pele Feb 25 '16 at 11:03
  • @AshishAhuja I am well aware of the differences between C and C++. However, this is a discussion that could also apply to C++, as the differences between the two have no impact on this topic. – Alexandru Pele Feb 25 '16 at 11:22
  • You had this question marked `C` and `C++`, and are outrightly saying that doing that is okay. Remarkable! – Box Box Box Box Feb 25 '16 at 11:28
  • @AshishAhuja what's so wrong with what I've just said? I've said that it is not a mistake to label with both C and C++ in the context of this particular problem. If I was using a language feature (OOP or the like) that was particular only to C++ then it would have indeed been a problem... – Alexandru Pele Feb 25 '16 at 11:34
  • I am trying to help you, not trying to look smart. In any case, tagging a question with two languages like `C` and `C++` is wrong. You are saying that this applies to `C` and `C++`, which is right, but then you can even add a tag for the `b` language. This code will work in that because `C` has actually originated from it. Adding tags like this is wrong, because in any case, a program in one language will not work in another. In such a case, when you will put this in C, you will add different headers, put `int main`, print using `printf`, etc. which is totally different from the C++ program – Box Box Box Box Feb 25 '16 at 11:40
  • Thanks for your answer! I respect your opinion but I still believe that this is a perfectly valid thing to do **in this particular case**. – Alexandru Pele Feb 25 '16 at 11:43
  • 1
    @AlexandruPele The problem with using multiple language tags is that there are subtle differences between C and C++, which you might not even realize when you are writing the question. Just looking at your little example, the answer might depend on whether `t` is a VLA or not, it might depend on if `i` is allowed to be declared inside the loop or not (C90), it might depend on operator overloading, it might depend on if a boolean condition results in a `bool` or an `int`. And so on. They are two different languages and not necessarily compatible even when it seems like they should be. – Lundin Feb 25 '16 at 12:05
  • @AlexandruPele, well it is not a perfectly valid thing to do in this particular case, and in other cases also. There are a lot of differences. Some are mentioned by lundin. – Box Box Box Box Feb 25 '16 at 12:35

3 Answers3

9

You are dealing with undefined behaviour. The compiler isn't required to lay out automatic variables sequentially (as they appear in the source code). Some of them might be in registers or they might be ordered in a different way, if, for example, smaller offsets are cheaper.

Such a requirement exists only for the members of a struct (with the members having arbitrary padding in between).

Is there any padding?

Yes, the compiler would honour the alignment requirements of each type and place the variables accordingly.

Is the order always the same as in the source code?

No, but this is the thing many exploits rely on. A buffer overflow may overwrite an adjacent variable and compromise the execution of the entire program.

Blagovest Buyukliev
  • 42,498
  • 14
  • 94
  • 130
4

Another people said why you are getting this behaviour from standard point of view, I will say what potentially could happen after your code compiled optimised and executed.

First : loop might be unrolled and executed 6 times:

t[0] = 0;
t[1] = 0;
t[2] = 0;
t[3] = 0;
t[4] = 0;
t[5] = 0;
i = 6;

It is allowed optimisation and that is what could happen. More: if i is not used later it can be removed altogether.

Second : Compiler might keep i in register without doing any stack allocation.

Third : It might place variable in any order on stack. There is no actual requirements on order in which varaibles are kept in memory (and on their locality too).

How to know what happen? Look at the generated assembly. It is the only way to know what happens.

BTW: Infinite loop does not always happens on Windows. In fact I weren't able to force your code verbatim to be an infinite loop.

Revolver_Ocelot
  • 8,609
  • 3
  • 30
  • 48
  • thanks! I didn't really consider compiler optimizations or the fact that the order in the stack is not necessarily the same. – Alexandru Pele Feb 25 '16 at 09:38
3

Accessing t[5] is an Undefined Behaviour. The last item is t[4] (t[0],t[1],t[2],t[3],t[4]) there is no t[5].

By Undefined Behaviour, anything may happen. It may gives expected results or totally messed up.

My question is, why this different behavior?

As I wrote before, it is UB you can not expect anything. Even you may get another result in the same machine if you try it many times.

Humam Helfawi
  • 19,566
  • 15
  • 85
  • 160
  • @Garf365: The C++ standard can't explain it, but it is certainly not the case that *nothing* can. – Benjamin Lindley Feb 25 '16 at 09:09
  • @BenjaminLindley Yes but why? why I should know what will happen if I make undefined thing? – Humam Helfawi Feb 25 '16 at 09:10
  • 2
    @HumamHelfawi: I don't know. Perhaps you are interested in how compilers work. – Benjamin Lindley Feb 25 '16 at 09:12
  • @BenjaminLindley it is not about how the compiler works. it about how the compilers deal with the things that you should not use :) it should not be even been documented and you can not even raise a case in the court towards the compiler if the compiler change this behaviour in the next version of it :) – Humam Helfawi Feb 25 '16 at 09:13
  • @HumamHelfawi: Not really sure what your point is. I was simply correcting a false statement (that's no longer there). I am not arguing that your answer needs modification. – Benjamin Lindley Feb 25 '16 at 09:14
  • "Garf365" OH! that it is not me! :D Forget about my comments and have a nice day :D – Humam Helfawi Feb 25 '16 at 09:16
  • 1
    @AlexandruPele perhaps the question is simply... badly worded. It seems to me that you're asking something in [THIS](http://stackoverflow.com/questions/3318410/pragma-pack-effect) direction, but applied to variables and "same code, different compilers, different platform" scope. – Shark Feb 25 '16 at 09:26