19

The situation is the following:

  1. a 32bit integer overflows
  2. malloc, which is expecting a 64bit integer uses this integer as input

Now on a 64bit machine, which statement is correct (if any at all):

Say that the signed binary integer 11111111001101100000101011001000 is simply negative due to an overflow. This is a practical existing problem since you might want to allocate more bytes than you can describe in a 32bit integer. But then it gets read in as a 64bit integer.

  1. Malloc reads this as a 64bit integer, finding 11111111001101100000101011001000################################ with # being a wildcard bit representing whatever data is stored after the original integer. In other words, it read a result close to its maximum value 2^64 and tries to allocate some quintillion bytes. It fails.
  2. Malloc reads this as a 64bit integer, casting to 0000000000000000000000000000000011111111001101100000101011001000, possibly because it is how it is loaded into a register leaving a lot of bits zero. It does not fail but allocates the negative memory as if reading a positive unsigned value.
  3. Malloc reads this as a 64bit integer, casting to ################################11111111001101100000101011001000, possibly because it is how it is loaded into a register with # a wildcard representing whatever data was previously in the register. It fails quite unpredictably depending on the last value.
  4. The integer does not overflow at all because even though it is 32bit, it is still in a 64bit register and therefore malloc works fine.

I actually tested this, resulting in the malloc failing (which would imply either 1 or 3 to be correct). I assume 1 is the most logical answer. I also know the fix (using size_t as input instead of int).

I'd just really want to know what actually happens. For some reason I don't find any clarification on how 32bit integers are actually treated on 64bit machines for such an unexpected 'cast'. I'm not even sure if it being in a register actually matters.

Hadi
  • 5,328
  • 11
  • 46
  • 67
user3472774
  • 201
  • 2
  • 6
  • 1
    it depends on the architecture. Some architectures don't actually have 32bit numbers per se, they treat all operations as 64bit. Regardless signed overflow is undefined behavior, unsigned overflow is well documented and is covered by how twos complement works. – Mgetz Mar 28 '14 at 13:55
  • It depends on system, but most likely it will just overflow like any other integer – nikola-miljkovic Mar 28 '14 at 13:56
  • 4
    @nikolaMM94 However, overflowing a signed integer is undefined behavior - a lot of different things could happen depending on the circumstances. – nos Mar 28 '14 at 13:57
  • @nos exactly. What i meant to say is that this is not special case – nikola-miljkovic Mar 28 '14 at 14:10
  • @nikolaMM94 But will it be read as ~2^64 from the 64bit perspective of malloc? – user3472774 Mar 28 '14 at 14:12
  • 2
    The rules for this situation are known in the standard as the "usual arithmetic conversions". How it applies to your case depends on exactly how you called `malloc`. (`malloc(a+b)` is not the same as `c = a + b; malloc(c)` if `sizeof(c) < sizeof(int)`.) Follow the rules for usual arithmetic conversions to see what happens in this case. – Raymond Chen Mar 28 '14 at 14:17
  • 3
    `malloc` is expecting 64 bit integer? No it is expecting an integer of `size_t`. You really should look up the basics about integer types in C (or C++ ?), in particular overflow is not at all the same if the type is signed or unsigned. – Jens Gustedt Mar 28 '14 at 14:26
  • 1
    @user3472774 You really can't tell. If an overflow occurs, it depends. I've debugged such problems, and in some cases the overflow will just spill over into the upper bits of a 64 bit register, in other cases it sign extends it, and you end up with a negative 64 bit value. This is unrelated to malloc, and related to your platform and how the compiler decided to generate code for the particular piece of code that overflowed. – nos Mar 28 '14 at 14:31
  • 7
    Better answers will result if you _post the code_ rather than just the description of the code. – chux - Reinstate Monica Mar 28 '14 at 15:23
  • `malloc()` is expecting a `size_t` which is an unsigned integer type. In C, "overflow" of unsigned integer types is different from "overflow" of signed integer types. To say you have "n-bit integer" is insufficient. – chux - Reinstate Monica Mar 28 '14 at 15:26
  • Me thinks you are asking about type conversion (32-bit to 64-bit types) instead of overflow. Your examples did not include (the common) case of sign extension. – brian beuning Apr 20 '14 at 14:46

3 Answers3

18

The problem with your reasoning, is that it starts with the assumption that the integer overflow will result in a deterministic and predictable operation.

This, unfortunately, is not the case: undefined behavior means that anything can happen, and notably that compilers may optimize as if it could never happen.

As a result, it is nigh impossible to predict what kind of program the compiler will produce if there is such a possible overflow.

  • A possible output is that the compiler elides the allocation because it cannot happen
  • A possible output is that the resulting value is 0-extended or sign-extended (depending on whether it's known to be positive or not) and interpreted as an unsigned integer. You may get anything from 0 to size_t(-1) and thus may allocate either too few or too much memory, or even fail to allocate, ...
  • ...

Undefined Behavior => All Bets Are Off

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • Thank you for giving examples of possible behaviors that are actually somewhat likely rather than talking about "nasal demons" and "formatting the hard drive" and whatnot. – Kyle Strand Mar 28 '14 at 23:20
  • @Kyle, nasal demons are certainly fictitious, but this sort of bug could could format the hard drive, or launch missiles if it were to occur on a missile control system! (malloc'ing not enough bytes and then writing beyond that). – M.M Apr 20 '14 at 14:52
  • @MattMcNabb Sure, as a side-effect of one of the "reasonable" compiler responses to code with undefined behavior. But no one's going to write a (mainstream) compiler that, upon seeing a statement with undefined behavior, inserts code *designed* to cause damage just because to do so would be technically permissible according to the standard. So it's valuable, I think, to consider what kind of code is actually likely to be produced by a compiler; I find that people frequently avoid those conversations, though, with comments to the effect that it doesn't matter since any code would be conforming. – Kyle Strand Apr 21 '14 at 15:39
  • 2
    @KyleStrand: In gcc, something like `unsigned mulMod65535(unsigned short x, unsigned short y) { return (x*y) & 0xFFFF;}` may have weird side-effects in the calling code if `x*y` exceeds 0x7FFFFFFF, but the authors of gcc don't see that as a problem. The authors of the C Standard, in the published Rationale, describe how they'd expect the majority of current implementations to act in cases where an integer expression yielding a numerical value between INT_MAX+1u and UINT_MAX is coerced to `unsigned`, but the question of whether to actually behave that way is a Quality of Implementation issue. – supercat Oct 09 '18 at 21:35
13

Once an integer overflows, using its value results in undefined behavior. A program that uses the result of an int after the overflow is invalid according to the standard -- essentially, all bets about its behavior are off.

With this in mind, let's look at what's going to happen on a computer where negative numbers are stored in two's complement representation. When you add two large 32-bit integers on such a computer, you get a negative result in case of an overflow.

However, according to C++ standard, the type of malloc's argument, i.e. size_t, is always unsigned. When you convert a negative number to an unsigned number, it gets sign-extended (see this answer for a discussion and a reference to the standard), meaning that the most significant bit of the original (which is 1 for all negative numbers) is set in the top 32 bits of the unsigned result.

Therefore, what you get is a modified version of your third case, except that instead of "wildcard bit #" it has ones all the way to the top. The result is a gigantic unsigned number (roughly 16 exbibytes or so); naturally malloc fails to allocate that much memory.

Community
  • 1
  • 1
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 1
    You base your reasoning on the assumption that after overflow the `int` is negative, this is not necessarily the case. It could be `1` and `malloc` would then allocate exactly `1` byte... it's actually a common source of crashes (and exploits). – Matthieu M. Mar 28 '14 at 14:55
  • @MatthieuM. OP mentions that he has tested this, and `malloc` failed to allocate, making him believe that it's "either 1 or 3". That is why I am reasonably certain that he's getting a negative number. – Sergey Kalinichenko Mar 28 '14 at 15:04
  • 1
    It is unclear what the OP tested, just because he *once* got a negative number that once extended yielded a huge number does not mean it will *always* occur as this. – Matthieu M. Mar 28 '14 at 15:25
  • I'm not sure it matters if you "use" the value of the integer that overflows. If you add an `unsigned x` to an `int n` that is known to be `INT_MAX`, the compiler should be able to infer that `x` must be zero, regardless of whether you use `n` again. – Samuel Edwin Ward Mar 28 '14 at 18:49
  • Accepting this answer because it **both** gives a general answer emphasizing the undefined behavior of overflows _and_ it explains this specific casus. – user3472774 Apr 01 '14 at 09:06
3

So if we have a specific code example, a specific compiler and platform we can probably determine what the compiler is doing. Which is the approach taken in Deep C but even then it may not be fully predictable which is a hallmark of undefined behavior, generalizing about undefined behavior is not a good idea.

We only have to take a look at the advice from the gcc documentation to see how messy it can get. The documentation offers some good advice on integer overflow, which says:

In practice many portable C programs assume that signed integer overflow wraps around reliably using two's complement arithmetic. Yet the C standard says that program behavior is undefined on overflow, and in a few cases C programs do not work on some modern implementations because their overflows do not wrap around as their authors expected.

and in the sub-section Practical Advice for Signed Overflow Issues says:

Ideally the safest approach is to avoid signed integer overflow entirely.[...]

At the end of the day it is undefined behavior and therefore unpredictable in the general case but in the case of gcc, in their implementation defined section on Integer says that integer overflow wraps around:

For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.

but in their advice about integer overflow they explain how optimization can cause problems with wraparound:

Compilers sometimes generate code that is incompatible with wraparound integer arithmetic.

So this quickly gets complicated.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • The compiler can assume that no undefined behaviour will happen, therefore no 32 bit integer overflow. So it can generate any code that will generate the correct result whenever there is no overflow. One possibility would be converting x, y both to signed 64 bit, adding as 64 bit, then interpreting as unsigned. This is correct whenever there is no 32 bit overflow, so it is legal. Of course, different code giving different results is legal as well. – gnasher729 Mar 28 '14 at 18:13
  • @gnasher729 which is basically what I said – Shafik Yaghmour Mar 28 '14 at 18:17