How to catch undefined behaviour without executing it?

Question

In my software I am using the input values from the user at run time and performing some mathematical operations. Consider for simplicity below example:

int multiply(const int a, const int b)
{
    if(a >= INT_MAX || B >= INT_MAX)
        return 0;
    else
        return a*b;
}

I can check if the input values are greater than the limits, but how do I check if the result will be out of limits? It is quite possible that a = INT_MAX - 1 and b = 2. Since the inputs are perfectly valid, it will execute the undefined code which makes my program meaningless. This means any code executed after this will be random and eventually may result in crash. So how do I protect my program in such cases?

There is no general answer to "how do I prevent UB?". For this particular case: http://stackoverflow.com/questions/2713972/how-do-i-detect-overflow-while-multiplying-two-2s-complement-integers — Oliver Charlesworth, Jun 21 '14 at 07:04
related http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflow-in-c-c — uchar, Jun 21 '14 at 07:05
In this particular case, casting both to `long long` and checking if the result of the multiplication is larger than `INT_MAX` or smaller than `INT_MIN` should usually work. — T.C., Jun 21 '14 at 07:08
It's perfectly reasonable to have functions that have undefined behaviour given certain inputs as long as you document it. In fact, ignoring the idea of using a bigger type for the result, I'd prefer it if a function as simple as `multiply` didn't perform two comparisons and a logical operator every time I called it (I'm imagining it replaces the `*` operator so would be called pretty often). I at least wouldn't want it to just return 0 - sounds like a source of some confusing bugs. — Joseph Mansfield, Jun 21 '14 at 07:09
`(a >= INT_MAX)` is equivalent to `(a == INT_MAX)` if a is an int — sp2danny, Jun 21 '14 at 07:11
@JosephMansfield yes I agree with you since my requirement is for HPC. The solution pointed by Oli is perfect but not suited for HPC. But how will documenting the error help? — Cool_Coder, Jun 21 '14 at 07:12
@Cool_Coder Well I have to ask why `*` is not suitable for multiplying two numbers? The nice thing about the built-in operators is that they work out these kind of issues for you (at 0 cost). — Joseph Mansfield, Jun 21 '14 at 07:13
* _is_ suitable for multiplying numbers. When did I question that? I was asking that how would documenting this undefined behaviour help if the error is caused by the user at run time... — Cool_Coder, Jun 21 '14 at 07:16
@Cool_Coder `But how will documenting the error help?` If the user fails to read the documentation, then it's their problem, not yours. — PaulMcKenzie, Jun 21 '14 at 07:17
Rather than returning 0 when the parameters are out of range, make two fuctions, e.g. `check_multiply` and `do_multiply`. The check function will make sure the constraints you defined are satisfied and can alert the user if they are not — Brandin, Jun 21 '14 at 07:33

score 5 · Accepted Answer · answered Jun 21 '14 at 07:23

This really comes down to what you actually want to do in this case.

For a machine where long or long long (or int64_t) is a 64-bit value, and int is a 32-bit value, you could do (I'm assuming long is 64 bit here):

long x = static_cast<long>(a) * b;
if (x > MAX_INT || x < MIN_INT)
   return 0;
else
   return static_cast<int>(x);

By casting one value to long, the other will have to be converted as well. You can cast both if that makes you happier. The overhead here, above a normal 32-bit multiply is a couple of clock-cycles on modern CPU's, and it's unlikely that you can find a safer solution, that is also faster. [You can, in some compilers, add attributes to the if saying that it's unlikely to encourage branch prediction "to get it right" for the common case of returning x]

Obviously, this won't work for values where the type is as big as the biggest integer you can deal with (although you could possibly use floating point, but it may still be a bit dodgy, since the precision of float is not sufficient - could be done using some "safety margin" tho' [e.g. compare to less than LONG_INT_MAX / 2], if you don't need the entire range of integers.). Penalty here is a bit worse tho', especially transitions between float and integer isn't "pleasant".

Another alternative is to actually test the relevant code, with "known invalid values", and as long as the rest of the code is "ok" with it. Make sure you test this with the relevant compiler settings, as changing the compiler options will change the behaviour. Note that your code then has to deal with "what do we do when 65536 * 100000 is a negative number", and your code didn't expect so. Perhaps add something like:

 int x = a * b;
 if (x < 0) return 0;

[But this only works if you don't expect negative results, of course]

You could also inspect the assembly code generated and understand the architecture of the actual processor [the key here is to understand if "overflow will trap" - which it won't by default in x86, ARM, 68K, 29K. I think MIPS has an option of "trap on overflow"], and determine whether it's likely to cause a problem [1], and add something like

#if (defined(__X86__) || defined(__ARM__))
 #error This code needs inspecting for correct behaviour 
#endif
    return a * b;

One problem with this approach, however, is that even the slightest changes in code, or compiler version may alter the outcome, so it's important to couple this with the testing approach above (and make sure you test the ACTUAL production code, not some hacked up mini-example).

[1] The "undefined behaviour" is undefined to allow C to "work" on processors that have trapping overflows of integer math, as well as the fact that that a * b when it overflows in a signed value is of course hard to determine unless you have a defined math system (two's complement, one's complement, distinct sign bit) - so to avoid "defining" the exact behaviour in these cases, the C standard says "It's undefined". It doesn't mean that it will definitely go bad.

Your solution is great except for 2 situations: 1. On 32 bit machines LONG_MAX == INT_MAX. So your solution will result in UB. 2. On any machine if the result >LONG_MAX. Apart from that I like your technique — Cool_Coder, Jun 21 '14 at 07:35
Right, I explained that the "`long` is assumed to be 64-bit, and `int` 32-bit" [which happens to be the case on my machine, but won't work on Windows, but Windows does have 64-bit types with efficient multipliciation] - it relies on having a type that is bigger than `int`, or it's pointless. — Mats Petersson, Jun 21 '14 at 07:37

Patrick Schlüter · Answer 2 · 2014-06-21T08:21:24.803

Specifically for the multiplication of a by b the mathematically correct way to detect if it will overflow is to calculate log₂ of both values. If their sum is higher than the log₂ of the highest representable value of the result, then there is overflow.

log₂(a) + log₂(b) < log₂(UINT_MAX)

The difficulty is to calculate quickly the log₂ of an integer. For that, there are several bit twiddling hacks that can be used, like counting bit, counting leading zeros (some processors even have instructions for that). This site has several implementations https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious

The simplest implementation could be:

unsigned int log2(unsigned int v)
{
  unsigned int r = 0;

  while (v >>= 1) 
    r++;
  return r;
}

In your program you only need to check then

 if(log2(a) + log2(b) < MYLOG2UINTMAX)
   return a*b;
 else
   printf("Overflow");

The signed case is similar but has to take care of the negative case specifically.

EDIT: My solution is not complete and has an error which makes the test more severe than necessary. The equation works in reality if the log₂ function returns a floating point value. In the implementation I limited thevalue to unsigned integers. This means that completely valid multiplication get refused. Why? Because log2(UINT_MAX) is truncated log₂(UINT_MAX)=log₂(4294967295)≈31.9999999997 truncated to 31.

We have there for to change the implementation to replace the constant to compare to

#define MYLOG2UINTMAX (CHAR_BIT*sizeof (unsigned int))

This is a good answer, yes. But calculating log of 2 numbers each time multiply is called especially if that is in a loop means drastic reduction in performance as will not allow me to take advantage of cache — Cool_Coder, Jun 21 '14 at 07:51
True, but you will get performance hits of the same order in any of the solutions presented here. haccks answer requires a very costly division. — Patrick Schlüter, Jun 21 '14 at 08:01
@Cool_Coder You don't need to validate every number, only untrusted numbers provided by the user. — Brandin, Jun 21 '14 at 08:03
Furthermore, if it is to check user input you can forget about the performance cost. The user input time is several orders of magnitude slower. The naive log2 shown here doesn't take that much time, in my estimation it's in the order of a maximum of 100 cycles. This means you can calculate between ~10 000 000 and ~50 000 000 logs per second on any PC from the last 15 years. — Patrick Schlüter, Jun 21 '14 at 08:28

haccks · Answer 3 · 2014-06-21T07:35:56.007

2

You may try this:

if ( b > ULONG_MAX / a )        // Need to check a  != 0 before this division  
   return 0;                    //a*b invoke UB
else
   return a*b;

edited Jun 21 '14 at 07:35

answered Jun 21 '14 at 07:21

haccks

104,019
25
176
264

But the point is that UB has already occurred, you can't rely on the behaviour of this code. – Oliver Charlesworth Jun 21 '14 at 07:24
That's only safe on some processors, and relies on understanding the code generated by the compiler and whether the overflow traps or not. But, yes, with relevant testing in the relevant (production code), it should work. – Mats Petersson Jun 21 '14 at 07:25
@OliCharlesworth; Yes. You are right. Removed that. – haccks Jun 21 '14 at 07:27
1

I am unable to understand how this works. Can you please explain the logic in a line or two? – Cool_Coder Jun 21 '14 at 07:32
Taking small limit for the sake of convenience; suppose max limit is `255`. `a = 15` and `b = 15`, then `a*b = 225`. If `b = 10` and `a = 25` then ` b > max_limit / a` will become false as `max_limit / a` would become < 10. – haccks Jun 21 '14 at 07:41

How to catch undefined behaviour without executing it?

3 Answers3