0

I've read a lot of articles talking about undefined behavior (UB), but all do talk about theory. I am wondering what could happen in practice, because the programs containing UB may actually run.

My questions relates to unix-like systems, not embedded systems.

I know that one should not write code that relies on undefined behavior. Please do not send answers like this:

  • Everything could happen
  • Daemons can fly out of your nose
  • Computer could jump and catch fire

Especially for the first one, it is not true. You obviously cannot get root by doing a signed integer overflow. I'm asking this for educational purpose only.

Question A)

Source

implementation-defined behavior: unspecified behavior where each implementation documents how the choice is made

Is the implementation the compiler?

Question B)

*"abc" = '\0';

For something else than a segfault to happen, do I need my system to be broken? What could actually happen even if it is not predictable? Could the first byte be set to zero ? What else, and how?

Question C)

int i = 0;
foo(i++, i++, i++);

This is UB because the order in which parameters are evaluated is undefined. Right. But, when the program runs, who decides in what order the parameters are evaluated: is is the compiler, the OS, or something else?

Question D)

Source

$ cat test.c
int main (void)
{
    printf ("%d\n", (INT_MAX+1) < 0);
    return 0;
}
$ cc test.c -o test
$ ./test
Formatting root partition, chomp chomp

According to other SO users, this is possible. How could this happen? Do I need a broken compiler?

Question E)

Use the same code as above. What could actually happen, except of the expression (INT_MAX+1) yielding a random value ?

Question F)

Does the GCC -fwrapv option defines the behavior of a signed integer overflow, or does it only make GCC assume that it will wrap around but it could in fact not wrap around at runtime?

Question G)

This one concerns embedded systems. Of course, if the PC jumps to an unexpected place, two outputs could be wired together and create a short-circuit (for example).

But, when executing code similar to this:

*"abc" = '\0';

Wouldn't the PC be vectored to the general exception handler? Or what am I missing?

Donald Duck
  • 8,409
  • 22
  • 75
  • 99
Bilow
  • 2,194
  • 1
  • 19
  • 34
  • 3
    For a good working model of what to expect with undefined behaviour, use your favorite search engine on "nasal demons". In short, there is no logic, no guarantee, if the hardware has a self-destruct sequence, it is allowed to be triggered. If it is an ATM it is allowed to shower you in money. – Yunnosch Sep 08 '17 at 15:35
  • implementation = compiler + libraries + operating system – Barmar Sep 08 '17 at 15:44
  • "Undefined behavior" means that the compiler may or may not do something sensible when it encounters the situation. The various examples about nasal demons and root partition formatting just serve to indicate that, if a compiler writer did that in case of undefined behavior, there's nothing in the standard forbidding them to do it lest they become non-standard-compliant. – Federico klez Culloca Sep 08 '17 at 15:44
  • 2
    From the link you supplied: **Will a real compiler emit code to chomp your disk? Of course not, ...** More generally, the compiler can't generate code that gets around the operating system's security policy (if it could, you could write an assembly program that did the same thing). – Barmar Sep 08 '17 at 15:48
  • 3
    But imagine if you have undefined behavior in code that's running in the kernel itself. Since the kernel implements the security policy, it's not bound by it, and a bug can result in just about anything. – Barmar Sep 08 '17 at 15:50
  • @Barmar Thanks for pointing out UB in the kernel, I didn't thought at all about it – Bilow Sep 08 '17 at 15:52
  • This is also why you need to be extremely careful when writing setuid programs and system daemons. Most security exploits are due to the undefined behavior that comes from writing outside array bounds. – Barmar Sep 08 '17 at 15:54
  • 1
    What are you really asking here? Is it (a) "I don't believe UB could really reformat my desk or cause demons to fly out of my nose. I want the people who said that to admit they were exaggerating." Or is it (b) "I don't believe UB is *that* bad, so if I know what I'm doing, and if I have a good reason to, I can write undefined code with reasonable confidence that nothing *too* bad will happen." If you're asking (b), let me tell you, undefined behavior truly can lead to arbitrarily bad results, and you really do want to learn how to shun it. See the links in the answer I posted. – Steve Summit Sep 08 '17 at 15:57
  • It is (c) "all I ask is for educational purpose, I will strive to not have UB in my code" – Bilow Sep 08 '17 at 16:00
  • 1
    There are a lot of interesting cases of what the compiler does with UB. For example transforming finite loops into infinite loops [example 1](https://stackoverflow.com/q/32506643/1708801) and [example 2](https://stackoverflow.com/q/24296571/1708801). My [answer here has a number of good UB references](https://stackoverflow.com/a/31746063/1708801). – Shafik Yaghmour Sep 08 '17 at 18:07
  • 2
    You cannot ask multiple questions at once. Stick to one and only one question at a time. – JK. Sep 09 '17 at 11:06

4 Answers4

3

In practice, most compilers use undefined behavior in either of the following ways:

  • Print a warning at compile time, to inform the user that he probably made a mistake
  • Infer properties on the values of variables and use those to simplify code
  • Perform unsafe optimizations as long as they only break the expected semantic of undefined behavior

Compilers are usually not designed to be malicious. The main reason to exploit undefined behavior is usually to get some performance benefit from it. But sometimes that can involve total dead code elimination.

A) Yes. The compiler should document what behavior he chose. But usually that is hard to predict or explain the consequences of UB.

B) If the string is actually instantiated in memory and is in a writable page (by default it will be in a read-only page), then its first character might become a null character. Most probably, the entire expression will be thrown out as dead-code because it is a temporary value that disappears out of the expression.

C) Usually, the order of evaluation is decided by the compiler. Here it might decide to transform it into a i += 3 (or a i = undef if it is being silly). The CPU could reorder instructions at run-time but preserve the order chosen by the compiler if it breaks the semantic of its instruction set (the compiler usually cannot forward the C semantic further down). An incrementation of a register cannot commute or be executed in parallel to an other incrementation of that same register.

D) You need a silly compiler that print "Formatting root partition, chomp chomp" when it detects undefined behavior. Most probably, it will print a warning at compile time, replace the expression by a constant of his choice and produce a binary that simply perform the print with that constant.

E) It is a syntactically correct program, so the compiler will certainly produce a "working" binary. That binary could in theory have the same behavior as any binary you could download on the internet and that you run. Most probably, you get a binary that exit straight away, or that print the aforementioned message and exit straight away.

F) It tells GCC to assume the signed integers wrap around in the C semantic using 2's complement semantic. It must therefore produce a binary that wrap around at run-time. That is rather easy because most architecture have that semantic anyway. The reason for C to have that an UB is so that compilers can assume a + 1 > a which is critical to prove that loops terminate and/or predict branches. That's why using signed integer as loop induction variable can lead to faster code, even though it is mapped to the exact same instructions in hardware.

G) Undefined behavior is undefined behavior. The produced binary could indeed run any instructions, including a jump to an unspecified place... or cleanly trigger an interruption. Most probably, your compiler will get rid of that unnecessary operation.

Nonyme
  • 1,220
  • 1
  • 11
  • 22
  • 1
    Regarding (D), you don't need a silly compiler, all you need is an instruction set where overflow results in a trap, with an uninitialized handler that jumps into no-man's-land, which happens to contain this function call. There are probably other ways to make it happen, too. – trent Sep 08 '17 at 16:36
3

You obviously cannot get root by doing a signed integer overflow.

Why not?

If you assume that signed integer overflow can only yield some particular value, then you're unlikely to get root that way. But the thing about undefined behavior is that an optimizing compiler can assume that it doesn't happen, and generate code based on that assumption.

Operating systems have bugs. Exploiting those bugs can, among other things, invoke privilege escalation.

Suppose you use signed integer arithmetic to compute an index into an array. If the computation overflows, you could accidentally clobber some arbitrary chunk of memory outside the intended array. That could cause your program to do arbitrarily bad things.

If a bug can be exploited deliberately (and the existence of malware clearly indicates that that's possible), it's at least possible that it could be exploited accidentally.

Also, consider this simple contrived program:

#include <stdio.h>
#include <limits.h>
int main(void) {
    int x = INT_MAX;
    if (x < x + 1) {
        puts("Code that gets root");
    }
    else {
        puts("Code that doesn't get root");
    }
}

On my system, it prints

Code that doesn't get root

when compiled with gcc -O0 or gcc -O1, and

Code that gets root

with gcc -O2 or gcc -O3.

I don't have concrete examples of signed integer overflow triggering a security flaw (and I wouldn't post such an example if I had one), but it's clearly possible.

Undefined behavior can in principle make your program do accidentally anything that a program starting with the same privileges could do deliberately. Unless you're using a bug-free operating system, that could include privilege escalation, erasing your hard drive, or sending a nasty e-mail message to your boss.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
2

To my mind, the worst thing that can happen in the face of undefined behavior is something different tomorrow.

I enjoy programming, but I also enjoy finishing a program, and going on to work on something else. I do not delight in continuously tinkering with my already-written programs, to keep them working in the face of bugs they spontaneously develop as hardware, compilers, or other circumstances keep changing.

So when I write a program, it is not enough for it to work. It has to work for the right reasons. I have to know that it works, and that it will keep working next week and next month and next year. It can't just seem to work, to have given apparently correct answers on the -- necessarily finite -- set of test cases I've run it on so far.

And that's why undefined behavior is so pernicious: it might do something perfectly fine today, and then do something completely different tomorrow, when I'm not around to defend it. The behavior might change because someone ran it on a slightly different machine, or with more or less memory, or on a very different set of inputs, or after recompiling it with a different compiler.

See also the third part of this other answer (the part starting with "And now, one more thing, if you're still with me").

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • 1
    Oh - there's worse than 'something different tomorrow'. There is 'something different after testing, release and delivery and installation by 10,000 customers who now either want their money back or are talking to their lawyers' :( – Martin James Sep 08 '17 at 17:55
0

It used to be that you could count on the compiler to do something "reasonable". More and more often, though, compilers are truly taking advantage of their license to do weird things when you write undefined code. In the name of efficiency, these compilers are introducing very strange optimizations, which don't do anything close to what you probably want.

Read these posts:

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • Before the post I have read most of your links (the first parts). They are helpful for a developer but I'd like to know what could happen from a practical point of view. The point is, I don't expect anything, just want to know – Bilow Sep 08 '17 at 15:50
  • 3
    @Bilow If you just think logically about how compilers and operating systems work, most of the realistic consequences should be pretty obvious. – Barmar Sep 08 '17 at 15:53
  • 3
    @Bilow I understand your curiosity. But as the Man in Black famously said, "Prepare to be disappointed." Even if we could give you a nice list of all the weird things that could actually happen today, it would be obsolete tomorrow, when optimizing compilers started doing things even more bizarre and incomprehensible. – Steve Summit Sep 08 '17 at 16:03
  • @SteveSummit I like your comment, it is down to earth, I would be very happy to read such a list and understand it, even if it will be deprecated soon – Bilow Sep 08 '17 at 16:05