4

I've written binary search recursively with my own effort . While calculating middle index, I did int mid = (low + high) / 2. I googled recursive version of binary search and found that the calculation is a bug. Because it may overflow if (low + high) is greater than maximum value of integer. Two types solutions there are.

  • int mid = low + (high - low) / 2
  • int mid = (high + low) >>> 1 (unsigned right shift, it does division by 2)

Both of them work like a charm. However, I wonder for first one how it works. Yes, it works but what the logic that the people think to find it is. For the second, how it works. Isn't there possibility to overflow for (high + low) still? I mean my attempt is called as buggy because of overflow but the second one also does (high + low).

3 Answers3

3

The reason the first one works is

  • it is mathematically identical: l + (h - l)/2 = l + h/2 - l/2 = l/2 + h/2 = (l + h)/2, and
  • because of the precedence of operations: first the parenthesis, then the division by 2, then the addition.

Funny enough though, this "solution" may still overflow on a large negative low so it is not really a solution to your original problem.

You could do something along the lines of low/2 + high/2 to avoid overflow, but you will need to explicitly write out computations for even-even (the above), even-odd, odd-even and odd-odd to remain in the int space. While doing so you can utilize >>> to avoid the actual division.

Oleg Sklyar
  • 9,834
  • 6
  • 39
  • 62
3

In (high + low) >>> 1, there can still be signed wraparound, but the >>> operator treats its left operand as unsigned so signed wraparound is not relevant. This approach works because there won't be unsigned wraparound: high and low are (as signed integers) non-negative, so as unsigned integers they cannot be large enough to cause unsigned wraparound.

That adding two signed-non-negative integers never suffers from unsigned wraparound can be seen by adding the largest possible numbers: 2k-1-1 + 2k-1-1 (where k is the size in bits of the integer type, commonly 32), which adds up to 2k-2. That doesn't wrap yet: the highest representable number is 2k-1.

So it is not the "overflow" in high + low that really causes the problem, at least in a reasonable language where signed arithmetic is safe, such as Java. It is the subsequent treatment of the result of high + low as a signed integer by a hypothetical / 2 that follows it that causes trouble.

harold
  • 61,398
  • 6
  • 86
  • 164
1

Assume your integers are 16-bit signed (just for brevity; scale them up if you need 32-bit integers).

The range is -32768...32767.

An example where low + (high - low) / 2 will overflow:

low = -20000
high = 30000
high - low = 50000 = (after overflow) -15536
(high - low) / 2 = -7768
low - (high - low) / 2 = -27768

An example where (high + low) >>> 1 will overflow:

low = -20000
high = -10000
low + high = -30000 = (binary) 1000101011010000
(low + high) >>> 1 = (binary) 0100010101101000 = 17768

Calculating average without any possibility for overflow is inefficient, and can be performed in multiple ways:

  1. Convert to higher "precision" first, e.g. 32-bit to 64-bit; convert back after calculating the average
  2. Divide your numbers by 2 and use some sort of "remainder"

    low2 = low >>> 1;
    high2 = high >>> 1;
    low_remainder = low & 1; // get the lowest significant bit
    high_remainder = high & 1; // get the lowest significant bit
    avg2 = low2 + high2; // this cannot overflow
    avg = avg2 + (low_remainder + high_remainder) / 2; // this cannot overflow
    

    I am not sure this exact code works; it's just an idea.

  3. Calculate the average the usual way, then check for "unexpected" result (the average is smaller than low or greater than high). If unusual, calculate it the "safe" way.

  4. Somehow make sure your input cannot cause overflow. This doesn't work for libraries, but often works in real applications, where you have an upper bound like 10000 on all your numbers. Or maybe your numbers are nonnegative; then low + (high - low) / 2 actually cannot overflow.

anatolyg
  • 26,506
  • 9
  • 60
  • 134