4

note : removed printf part, since it's explained in a different post.

I learned C from K&R 2nd edition a few yours ago. I have not used C in a while, so I decided to skim through a more modern book. “C How to Program, 8th Edition” by Deitel and Deitel (published by Pearson in 2015), which has C99 and C11.

He covers security, unlike K&R. One thing that surprised me, is integer overflow. He wrote:

Section 3.13 Secure C Programming • Adding integers can result in a value that’s too large to store in an int variable. This is known as arithmetic overflow and can cause unpredictable runtime behavior, possibly leaving a system open to attack.

In a different page he has:

It’s considered a good practice to ensure that before you perform arithmetic calculations like the one in line 18 of Fig. 2.5, they will not overflow. The code for doing this is shown on the CERT website https://www.securecoding.cert.org — just search for guideline “INT32-C.”

If you look up the code they recommend:

5.3.3.2 Compliant Solution This compliant solution ensures that the addition operation cannot overflow, regardless of representation:

#include <limits.h>

void f(signed int si_a, signed int si_b)
{
    signed int sum;

    if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
        ((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
        /* Handle error */
    }
    else {
        sum = si_a + si_b;
    }
    /* ... */
}

My understanding was, although unsigned int behavior is undefined, it's always a fixed size of bits. On my computer it's a 32 bit int. So on my laptop my INT_MAX = 2147483647 and if I add 1 to it i get -2147483648. If I keep adding one to it, it will eventually get to 0, then go back up to INT_MAX and keep going round and round. I don’t see how this can be attacked by someone using my code?

To add extra code, every time I add ints, seems very wasteful, unless there really is a variability I have to look out for, not just getting the wrong result.

edit: I'm adding the quote back, because of the discussion below:

Avoid Single-Argument printfs. One such guideline is to avoid using printf with a single string argument. If you need to display a string that terminates with a newline, use the puts function, which displays its string argument followed by a newline character. For example, in Fig. 2.1, line 8

printf( "Welcome to C!\n" ); should be written as: puts( "Welcome to C!" ); We did not include \n in the preceding string because puts adds it automatically. If you need to display a string without a terminating newline character, use printf with two arguments — a "%s" format control string and the string to display. The %s conversion specifier is for displaying a string. For example, in Fig. 2.3, line 8

printf( "Welcome " );should be written as:

printf( "%s", "Welcome " );

Although the printf in this chapter as written are actually not insecure, these changes are responsible coding practices that will eliminate certain security vulnerabilities as we get deeper into C.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Manny_Mar
  • 398
  • 2
  • 13
  • Possible duplicate of [How can a Format-String vulnerability be exploited?](https://stackoverflow.com/questions/7459630/how-can-a-format-string-vulnerability-be-exploited) –  Dec 16 '18 at 00:15
  • Two short thoughts. The concern on integer overflow and vulnerability has to do with how overflow may effect the function return values, sizing for memory, or where negative values computed lead to undefined behavior (as when used as an array index, etc..). You correctly note that overflow will essentially wrap around the full range of `int` if you continue to add one, etc.. which can further lead to infinite loops or other undesirable behavior. (it's good to check, the cost is negligible and a compiler will optimize) `printf` is variadic, no need to use it to output a simple string. – David C. Rankin Dec 16 '18 at 00:16
  • 3
    The problem is that undefined behavior is undefined, and signed integer overflow doesn't always wrap around. Compilers can assume undefined behavior will never occur, so they might optimize your code in a way such that overflows results in exploitable behavior instead of just wrapping around. You have to get out of the habit of making assumptions about UB. – eesiraed Dec 16 '18 at 00:32
  • @DavidC.Rankin *affect – LtWorf Dec 16 '18 at 00:40
  • Happens... `affect` ("to act on; produce an effect or change in") verses `effect` ("the result")... I'm also bad with "insure/ensure" `:)` – David C. Rankin Dec 16 '18 at 00:43
  • If you want the integer behavior of the CPU in C/C++, use **volatile** – curiousguy Dec 16 '18 at 00:45
  • Canonical: https://stackoverflow.com/questions/2913618/how-is-integer-overflow-exploitable – Hans Passant Dec 16 '18 at 00:58
  • 1
    @curiousguy: why do you think using `volatile` will affect the behaviour of integer arithmetic overflow? Can you provide any authoritative reference that explains your suggestion? – Jonathan Leffler Dec 16 '18 at 02:03
  • 1
    @curiousguy: I think I understand what `volatile` does — and it is not at all obvious to me why you think it would have any effect whatsoever on the behaviour of integer overflow. "All" `volatile` does is say that the compiler must make an appropriate reference to the volatile-qualified variable whenever the source code refers to it, so it cannot (for example) optimize reads away assuming that the value is still the same as the last time it was read — which has precisely nothing to do with how an integer expression is evaluated once the values are read, which is what's relevant to overflow. – Jonathan Leffler Dec 16 '18 at 02:28
  • @JonathanLeffler "_so it cannot (for example) optimize reads away_" Exactly. The compiler will not optimize away any access and must translate directly C/C++ code into assembly. Of course volatile semantics has no effect on whether the assembly instructions produce 2^n modulo results on integers represented in 2-complement; **as long as the CPU instructions do that** (and they often do) the compiler will exactly give you that. Historically a few CPU did not have 2-complement representation or 2^n modulo results, in which case volatile will not help you. – curiousguy Dec 16 '18 at 04:03
  • @curiousguy: I think your logic is mistaken. – Jonathan Leffler Dec 16 '18 at 04:05
  • @JonathanLeffler Unless overflow checking is enforced by the compiler, the **most efficient code that gives correct results** (in absence of UB) is generated. (Many compilers don't offer the option of integer overflow checking.) You can check yourself in the CPU documentation whether the most efficient arithmetic instructions gives you what you want. So the argument is that 1) compilers don't go out of the way, reducing efficiency, to check something unless being told to check it by a compiler option 2) the most efficient addition in assembly almost always does wrap around – curiousguy Dec 16 '18 at 04:07
  • @curiousguy: you are discussing what specific CPUs might or might not do, rather than what the standard says is the case. This is likely the disconnect. The behaviour of signed integer overflow is undefined by the standard. The use of `volataile` has no effect on this — in terms of the standard. I don't think there's any point in continuing the discussion; my part stops here. – Jonathan Leffler Dec 16 '18 at 04:09
  • @JonathanLeffler I never said anything the volatile qualifier itself guaranteeing wrap around! I wrote that volatile provides the semantics of low level code, which almost always does wrap around. There is no "disconnect". Why do **you** believe a compiler writer would *not* provide that CPU op integer semantics? – curiousguy Dec 16 '18 at 04:13
  • 1
    If you want to guarantee wraparound behavior for signed integers, nemequ's suggestion of passing flags to your specific compiler that force it to use wraparound semantics for signed integers is the only reliable approach. – Ray Dec 16 '18 at 06:09
  • @curiousguy On second thought, you're right about that part. volatile *would* make the compiler assume that x might have been changed by hardware at some point before the conditional, so it wouldn't optimize it away. But the rest of objection stands. Wraparound semantics are not guaranteed. – Ray Dec 16 '18 at 06:16
  • @Ray "_Wraparound semantics are not guaranteed_" (by neither the C nor C++ std). But combining expectations WRT code gen (smaller, fastest impl. of int ops) and the CPU document, you get something. – curiousguy Dec 16 '18 at 06:23
  • @Ray no, they're not that either. There were some bugs even in `-fwrapv` of GCC... – Antti Haapala -- Слава Україні Dec 16 '18 at 06:28
  • 2
    @curiousguy please read and understand [this article](https://www.viva64.com/en/b/0374/) before you continue this "smaller, fastest impl. of int ops" mantra) – Antti Haapala -- Слава Україні Dec 16 '18 at 09:12
  • @AnttiHaapala Where does that article mentions volatile? – curiousguy Dec 16 '18 at 17:36
  • @Ray "_nemequ's suggestion of passing flags to your specific compiler_" But then you must compile the whole program with these exact flags, right? – curiousguy Dec 16 '18 at 17:41
  • @curiousguy nowhere. It is regarding "smaller, faster, impl of int ops" – Antti Haapala -- Слава Україні Dec 16 '18 at 17:55
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/185324/discussion-between-curiousguy-and-antti-haapala). – curiousguy Dec 16 '18 at 18:00
  • 1
    The discussion of printf format strings has nothing whatsoever to do with the rest of the question. I would suggest removing it, stick to one topic per question. – M.M Dec 20 '18 at 03:17
  • Readers beware: my answer indicating that volatile would probably fix the problem was downvoted then deleted, without anyone emitting an objection to any part of it. Just because people don't feel like using volatile. Programming isn't about feelings. – curiousguy Dec 31 '18 at 10:55

4 Answers4

2

Integer overflow

Behavior on overflow for signed integers is undefined. (That's actually the example used when defining undefined behavior in C11 3.4.3). It's not uncommon for systems to wrap around, but it's not universal, and even if that is how the system handles it, the compiler is permitted to assume that overflow will not occur and optimize accordingly.

Unsigned integers are guaranteed to wrap around on overflow (or more formally, the value is reduced modulo the max representable value) (C11 6.2.5.9). You should still be aware of whether the value is in a range where this is likely to happen, since having well-defined behavior isn't especially helpful if that behavior isn't what you wanted it to do.

As for whether you actually need to add those checks before each addition...unless it's really critical code (and I mean space shuttle guidance system or pacemaker controller levels of mission critical), my opinion is that those checks would be overkill most of the time. Instead, be aware of what sort of range of values are possible, and choose a datatype such that the value will never be in the sort of range where it might overflow (and if there are cases where it might, test it in those places and only those places). To ensure this in cases where the numbers are user-supplied, you might have some explicit checks when the user first enters the data so as to verify that the values are in sane ranges. (Of course, if you do that, there is always the risk that the user and you will have different ideas as to what ranges are sane, but better to reject an input than to accept it and return incorrect results.)

printf

There's no risk in printf("Hello, world!\n"). It's possible that puts("Hello, world!") will compile to more efficient code, but I doubt it'd make a noticeable difference; the bottleneck will be the actual I/O.

But there is a risk if you do printf(s), where s contains user-supplied data. If the user causes s to contain, e.g., "Foo %s", then it will attempt to run printf("Foo %s"), scan to the "%s", try to read the (nonexistent) next argument, and crash (or do some other undefined behavior).

Ray
  • 1,706
  • 22
  • 30
2
  1. You don't need to check every single operation, only the ones that could overflow. One obvious example is any operations based on untrusted input, but even then it may be more appropriate to do a sanity check on arguments instead of generically checking all math operations.

    Ray's statement that "… unless it's really critical code (and I mean space shuttle guidance system or pacemaker controller levels of mission critical), my opinion is that those checks would be overkill most of the time …" is dangerous. Yes, he's right that you don't need to check every operation, but you should always check the ones which can overflow.

    https://undeadly.org/cgi?action=article&sid=20060330071917 has a good example of where this can bite you, even if you're not writing a space shuttle guidance system or pacemaker controller.

    It's also worth noting that compilers can sometimes optimize away checks that it can prove will never fail. For example, if you were to call f() with two constants and check the result, there is a decent chance that the compiler will completely optimize it away, especially at higher optimization levels.

  2. The code suggested by CERT is portable, but generally not the fastest option. GCC and clang have __builtin_*_overflow intrinsics, and on Windows there is an <intsafe.h>. If a larger type is available (e.g., if you want to check the result of a 32-bit operation and you have 64-bit types available) it should be pretty quick to perform the operation using the larger type then cast back. If you want some portable code to do it, there is a safe-math module in portable-snippets (disclaimer: my project) you can steal.
  3. Static analysis tools can be very helpful here; several of them can detect potential integer overflows. I know I've seen such errors from Coverity, and I think I've seen them from scan-build and cppcheck.
nemequ
  • 16,623
  • 1
  • 43
  • 62
  • `Ray's statement...is dangerous. ...you don't need to check every operation, but you should always check the ones which can overflow.` I agree, and said something to that effect a bit later in the answer. The space shuttle and pacemaker cases are the ones where you should check *everywhere* just in case. – Ray Dec 16 '18 at 05:55
0

1) On some machines, x = MAX_INT + 1 causes an overflow exception to be raised, thus causing an alteration in control flow that may be exploitable.

2) Checks on limits are at risk from overflow. On a machine without overflow exceptions, it is not always true that x + y >= x even where x and y are both positive. Thus, if the attacker can supply a length, a naïve check like if (pointer + length < end of allocated space) can be bypassed by supplying a very large length.

EDITED: to remove printf part of answer, since that part of the question has been removed. The linked page gives the answer I gave, in more detail, so that part of my answer is redundant.

  • 3 is questionable and the term is more properly *"conversion specifiers"* rather than *"insertion-specifier"*. It's hard to see how it could end up accessing thing that they shouldn't -- unless the programmer has wholly failed to insure a *nul-terminated* string is passed. The point being, don't use `printf ("%s\n", "the string");` to simply `puts ("the string");` which is akin to not using `printf ("%c", 'a');` where `putchar('a');` is proper (even though a good compiler will optimize `printf` to `puts` or `putchar` in both cases) – David C. Rankin Dec 16 '18 at 00:22
  • In many implementations, ```printf("%s")``` will read off the end of the argument list, treating whatever happens to be there as a pointer, and will merrily fetch characters from there until a zero byte is encountered. The ```%s``` was not literally supplied by the programmer, it was read as input (network, user, whatever). See the link posted above to format-string vulnerabilities. –  Dec 16 '18 at 00:30
  • Granted, a *misuse* of `printf` can do bad things (as a misuse of any other function can). The compiler in that case is required to warn of a missing argument. (the C11 Standard also defines a mismatch in printf argument types as *Undefined Behavior* [C11 §7.21.6.1(p2 & p9) The fprintf function - insufficient arguments, or incorrect type.](http://port70.net/~nsz/c/c11/n1570.html#7.21.6.1p9)) – David C. Rankin Dec 16 '18 at 00:31
  • 1
    The compiler can warn of nothing in the case I am talking about. ```char *s = read_string_from_socket(); printf(s);``` The content of ```s``` is not known until execution. And, yes, the behavior is undefined; but it is nevertheless exploitable by an attacker. –  Dec 16 '18 at 00:36
  • Like I said, the *misuse* of `printf` can and will cause problems. Your example violates the `"const char * restrict format, ..."` definition of `printf/fprintf` See [C11 Standard - 7.21.6.1p1](http://port70.net/~nsz/c/c11/n1570.html#7.21.6.1p1) – David C. Rankin Dec 16 '18 at 00:41
  • Relative to printf with a single argument, the OP asked, in text no longer there, "how can it be exploited". Both I and user Ray answered that in the same way: it can be exploited if the format-string can be supplied by an attacker. Arguing that this is a "misuse" of printf doesn't ameliorate the vulnerability. And now I leave the discussion to [this page](https://stackoverflow.com/questions/7459630/how-can-a-format-string-vulnerability-be-exploited) which does a better job. –  Dec 16 '18 at 00:50
  • Yes, granted, and he should be taken out and shot for deleting information from the original question (but being a relatively new S.O. user, he will be spared -- this time). Don't get me wrong, I agree with what you are saying `printf (s);` is bad, real bad, but it is also a complete misuse of `printf`. No need to re-hash, I understand what you are saying. – David C. Rankin Dec 16 '18 at 00:56
  • I added the quote back, since people are talking about it. – Manny_Mar Dec 16 '18 at 01:17
0

although unsigned int behavior is undefined, [...]

You are misunderstanding "undefined". It really does mean undefined. There is no definition of behaviour. Anything at all can happen. You should not expect, let alone rely on, any particular behaviour as you describe in your answer.

Here are some of the possibilities:

  • Appears to work like wraparound arithmetic
  • Appears to work like saturating arithmetic
  • Prints unexpected thing
  • Formats the hard drive
  • Launches missiles
  • Sets the computer on fire
  • (anything else you want to put here)

See What is undefined behaviour?

M.M
  • 138,810
  • 21
  • 208
  • 365