1

I wrote a very trivial program to try to examine the undefined behavior attached to buffer overflows. Specifically, regarding what happens when you perform a read on data outside the allocated space.

#include <iostream>
#include<iomanip>

int main() {
    int values[10];
    for (int i = 0; i < 10; i++) {
        values[i] = i;
    }

    std::cout << values << " ";
    std::cout << std::endl;
    for (int i = 0; i < 11; i++) {
        //UB occurs here when values[i] is executed with i == 10
        std::cout << std::setw(2) << i << "(" << (values + i) << "): " << values[i] << std::endl;
    }
    system("pause");
    return 0;
}

When I run this program on Visual Studio, the results aren't terribly surprising: reading index 10 produces garbage:

000000000025FD70 
 0(000000000025FD70): 0
 1(000000000025FD74): 1
 2(000000000025FD78): 2
 3(000000000025FD7C): 3
 4(000000000025FD80): 4
 5(000000000025FD84): 5
 6(000000000025FD88): 6
 7(000000000025FD8C): 7
 8(000000000025FD90): 8
 9(000000000025FD94): 9
10(000000000025FD98): -1966502944
Press any key to continue . . . 

But when I fed this program into Ideone.com's online compiler, I got extremely bizarre behavior:

0xff8cac48 
0(0xff8cac48): 0
1(0xff8cac4c): 1
2(0xff8cac50): 2
3(0xff8cac54): 3
4(0xff8cac58): 4
5(0xff8cac5c): 5
6(0xff8cac60): 6
7(0xff8cac64): 7
8(0xff8cac68): 8
9(0xff8cac6c): 9
10(0xff8cac70): 1
11(0xff8cac74): -7557836
12(0xff8cac78): -7557984
13(0xff8cac7c): 1435443200
14(0xff8cac80): 0
15(0xff8cac84): 0
16(0xff8cac88): 0
17(0xff8cac8c): 1434052387
18(0xff8cac90): 134515248
19(0xff8cac94): 0
20(0xff8cac98): 0
21(0xff8cac9c): 1434052387
22(0xff8caca0): 1
23(0xff8caca4): -7557836
24(0xff8caca8): -7557828
25(0xff8cacac): 1432254426
26(0xff8cacb0): 1
27(0xff8cacb4): -7557836
28(0xff8cacb8): -7557932
29(0xff8cacbc): 134520132
30(0xff8cacc0): 134513420
31(0xff8cacc4): 1435443200
32(0xff8cacc8): 0
33(0xff8caccc): 0
34(0xff8cacd0): 0
35(0xff8cacd4): 346972086
36(0xff8cacd8): -29697309
37(0xff8cacdc): 0
38(0xff8cace0): 0
39(0xff8cace4): 0
40(0xff8cace8): 1
41(0xff8cacec): 134514984
42(0xff8cacf0): 0
43(0xff8cacf4): 1432277024
44(0xff8cacf8): 1434052153
45(0xff8cacfc): 1432326144
46(0xff8cad00): 1
47(0xff8cad04): 134514984
... 
//The heck?! This just ends with a Runtime Error after like 200 lines.

So apparently, with their compiler, overrunning the buffer by a single index causes the program to enter an infinite loop!

Now, to reiterate: I realize that I'm dealing with undefined behavior here. But despite that, I'd like to know what on earth is happening behind the scenes to cause this. The code that physically performs the buffer overrun is still performing a read of 4 bytes and writing whatever it reads to a (presumably better protected) buffer. What is the compiler/CPU doing that causes these issues?

Xirema
  • 19,889
  • 4
  • 32
  • 68
  • 2
    If a program exhibits UB you can't usefully reason about its behaviour. –  Jan 27 '17 at 20:21
  • @NeilButterworth What if I want to use*lessly* reason about its behavior? =D – Xirema Jan 27 '17 at 20:22
  • 2
    Do note questions really should be useful for future reads. I struggle to see any usefulness in this. – NathanOliver Jan 27 '17 at 20:23
  • 1
    @Xi Then this really isn't the place for it. –  Jan 27 '17 at 20:23
  • 1
    I agree with @NeilButterworth. I wouldn't go down that road, trying to find out what's happening internally here, because we simply lack information. For all we know, ideone.com might not even run the code on x86 (or is there some guarantee for this?). The OS, calling conventions, compiler, and other things determine what happens here and we really don't know about that all. If you can make precise statements about the environment in use, tell us about it, and we'll give it a shot. – cadaniluk Jan 27 '17 at 20:24
  • Just for fun, try changing the integer constant `10` to an actual `static int` variable, and add that variable to your output. – Andrew Henle Jan 27 '17 at 20:25
  • I guess because `values[i]` is only valid until `i <= 9`, so `i < 11` is either "true" or causing UB, so the optimizer ignores the UB case and simplifies the condition to `true` and turn it into an infinite loop. – kennytm Jan 27 '17 at 20:29
  • 1
    Once you write UB the compiler can do whatever it wants and different compilers will do different things. The program simply has no meaning if it contains UB. No point trying to reason about it. – Jesper Juhl Jan 27 '17 at 20:36
  • 1
    @JesperJuhl Have you ever written anything that is not a strictly conforming C++ program? Have you ever used *any* GUI library, *any* POSIX call, *any* network call? If so, as far as the standard is concerned, your program has UB. Knowing what UB is okay (because it's a supported extension) and what isn't (because the compiler will aggressively assume it never happens in any program) is important. It just so happens that in this case, it falls in the latter category, but your comment goes too far. –  Jan 27 '17 at 20:41
  • @JesperJuhl, disagree. Reasoning about behavior is good, as it allows one to troubleshoot back - see the program behavior and understand what is causing it. – SergeyA Jan 27 '17 at 20:42
  • @jvf I don't think you understand what undefined behaviour is - it is certainly not necessarily a result of calling third-party libraries. –  Jan 27 '17 at 20:43
  • @hvd, one should not mix undefined behavior and simply something which is not unspecified. GUI library itself does not have to impose undefined behavior - as long as it itself is well-formed. – SergeyA Jan 27 '17 at 20:44
  • 1
    I gotta say, I'm very disheartened by every person insisting "it's UB, there's no point trying to discuss it". Not everything on this website needs to be constrained solely to the realm of purely practical knowledge. – Xirema Jan 27 '17 at 20:49
  • @Xi Apart from anything else, there is no point discussing it here because SO is explicitly not a discussion forum. –  Jan 27 '17 at 20:56
  • @NeilButterworth "discuss" was the wrong word. I posted a strange behavior I observed, asked why it was happening, someone came up with a good explanation, and I used their explanation to verify what I was observing. If that's *not* what this website is meant to be used for, then what exactly is its purpose? Saying "It's UB, don't try to think about it" isn't helpful, it's just lazy. I understand that sometimes novice programmers run into UB and need to be steered away from it hastily, but this clearly isn't a scenario where that's the highest priority. – Xirema Jan 27 '17 at 21:03
  • @hvd - right, but it defines the rules which are applied to compile those libraries. Otherwise, any line of code which is not explicitly listed in standard would have to be considered undefined behavior. – SergeyA Jan 27 '17 at 21:06
  • @SergeyA Agreed, as long as the methods contained in those libraries avoid UB. But there's no way for a GUI library to avoid UB internally, because the standard library doesn't provide the building blocks needed for it. –  Jan 27 '17 at 21:08
  • @hvd, I beg to differ. For example, GUI library might simply write to video memory, and there is nothing undefined about it (writing to non-safely derived pointer is implementation defined, rather than undefined). – SergeyA Jan 27 '17 at 21:10
  • @SergeyA Hmm, you have a point. I'm not aware of compiler or standard library implementations that define this, or GUI library implementations that make use of it, but I suppose there is nothing forbidding it. –  Jan 27 '17 at 21:15
  • Sounds like aggressive loop optimization see [this](http://stackoverflow.com/q/32506643/1708801) and [this](http://stackoverflow.com/q/24296571/1708801) – Shafik Yaghmour Jan 27 '17 at 21:46
  • Wandbox was finally fixed, confirmed my comment above that this is aggressive loop optimization at work. If we use `-fno-aggressive-loop-optimizations` then the behavior changes, [see it live](http://melpon.org/wandbox/permlink/aGIZgs17kx806hHE) – Shafik Yaghmour Jan 27 '17 at 23:33

1 Answers1

4

There are two execution paths leading to the condition i < 11 being evaluated.

The first is before the initial loop iteration. Since i had been initialised to 0 just before the check, this is trivially true.

The second is after a successful loop iteration. Since the loop iteration caused values[i] to be accessed, and values only has 10 elements, this can only be valid if i < 10. And if i < 10, after i++, i < 11 must also be true.

This is what Ideone's compiler (GCC) is detecting. There is no way the condition i < 11 can ever be false unless you have an invalid program, therefore it can be optimised away. At the same time, your compiler doesn't go out of its way to check whether you might have an invalid program unless you provide additional options to tell it to do so (such as -fsanitize=undefined in GCC/clang).

This is a trade off implementations must make. They can favour understandable behaviour for invalid programs, or they can favour raw speed for valid programs. Or a mix of both. GCC definitely focuses greatly on the latter, at least by default.

  • 1
    I think, in this particular case, compiler might warn instead of aggresively optimizing. Optimizations it needs to perform in this case are clear producing a conflicting program (at one point, it has to assume that 10 + 1 is lesser than 11). Granted, warning is not mandated by standard, but so is optimization, and warning would serve the interest of users better. – SergeyA Jan 27 '17 at 20:42
  • I did some testing using Godbolt and it looks like your explanation is correct: when the [check value is 11](https://godbolt.org/g/FrCPVN), the compiler completely removes the assembly associated with jumping out of the loop, whereas [with a value of 10](https://godbolt.org/g/oQJR0q), the two instructions are added back. – Xirema Jan 27 '17 at 20:43
  • I agree with you @SergeyA: the compiler is so smart to understand that the condition the programmer wrote will be always true: before optimizing, it should warn the poor programmer "you really want to check a condition that is always true?" – Gian Paolo Jan 27 '17 at 20:44
  • @SergeyA Ideone uses GCC 5.1, and GCC 5.1 does warn for it. It generates "warning: iteration 10u invokes undefined behavior [-Waggressive-loop-optimizations]", this warning is enabled by default, but is only emitted when GCC actually detects the problem, which depends very much on the optimisation level. –  Jan 27 '17 at 20:45
  • 1
    @SergeyA Change `-O3` to `-O` and you get the warning. That's what I meant by "when GCC actually detects the problem". Unfortunately it doesn't manage to detect it reliably. –  Jan 27 '17 at 20:51
  • @SergeyA this is finicky warning also see my [comment here](http://stackoverflow.com/questions/41902356/undefined-behavior-quirk-reading-outside-a-buffer-causes-a-loop-to-never-termin#comment70989455_41902356) – Shafik Yaghmour Jan 27 '17 at 23:13
  • Non-portable programs are not "invalid". Many implementations offer behavioral guarantees in cases where the Standard doesn't mandate it, and in many cases code which exploits such guarantees may be able to execute faster than code which needs to limit itself to those guarantees provided by the Standard. On many platforms, however, the most efficient code could be achieved by having compilers guarantee that certain actions would at worst yield an Unspecified value, and by having programmers exploit that guarantee. – supercat Jan 29 '17 at 16:22
  • @supercat I used "invalid" because the categories "strictly conforming program" and "programs that have defined behaviour" are too narrow and the category "conforming program" is too broad. I didn't claim all non-portable programs are invalid, but certainly programs where the behaviour is not defined by the standard, where the behaviour is not defined by the implementation, and where the actual concrete behaviour is not what the programmer intended because of it, should not be considered valid. –  Jan 29 '17 at 16:28
  • If, for example, a program is supposed to search many possibilities to identify the few that are of interest, it's often helpful to use a screening function to separate uninteresting possibilities from "potentially-interesting" ones. If a compiler can guarantee that certain actions will at worst yield Unspecified results, code exploiting that guarantee could give a compiler more freedom than code which must accept or reject candidates determinstically even in cases where it wouldn't matter. – supercat Jan 29 '17 at 16:35
  • @hvd: When C was invented, the expression `foo[i]` instructed the compiler to scale `i` by `sizeof *foo`, adding that to address `foo`, and using the object at the appropriate address, with whatever consequences might befall. Specifying that implementations are only required to behave in such fashion with indices that are in bounds would have been useful in two ways in 1989: (1) it allows compilers to reorder accesses to unrelated objects; (2) it allows implementations to add bounds-checking features if they so choose (I recall some compilers of that era advertising that). – supercat Jan 29 '17 at 16:45
  • @hvd; For needle-in-haystack programs where all interesting possibilities and 99% of uninteresting possibilities would yield in-bounds array indices, optimal performance may be achieved by having a compiler perform treat reads beyond the array as yielding Unspecified value. That would allow accesses to following objects to be reordered with respect to array accesses, but not require the machine code to include instructions to prevent accesses to those objects. Undefined Behavior is only good for optimization in cases where programmers don't have to add logic to guard against it. – supercat Jan 29 '17 at 16:57
  • @supercat I do not see what the main message is you're trying to get across and how it relates to my answer, so I cannot respond to that without getting into irrelevant details. This is not exactly the first time I'm having trouble understanding your comments. It could well just be a failure in my reading comprehension, but either way, the result is the same: if you want a meaningful response from me, I'm going to really need you to be clearer, sorry. –  Jan 29 '17 at 17:00
  • @hvd: My main point is that the Standard doesn't categorize programs as "Invalid" on the basis of runtime behavior. Ever since the second C compiler was produced, C has been a family of dialects which tended to provide different features and guarantees on different platforms. In many cases, dialects were defined more by precedent than formal documents; if a compiler would have to go out of its way *not* to behave in a certain fashion, and there would be no imaginable benefit, there was no perceived need to waste ink promising that a compiler would behave in the obvious fashion. – supercat Jan 29 '17 at 17:39
  • @hvd: Presently, the Standard does not define any category of program for which it specifies the meaning, but which implementations would not be required to process (which I'd call "Selectively Conforming"). While most tasks can't be accomplished efficiently by Strictly Conforming programs, they could be accomplished by Selectively Conforming programs, and many existing programs could be made Selectively Conforming merely by the addition of directives specifying requirements. – supercat Jan 29 '17 at 18:08