Binary Search Middle Value Logic

Question

I've written binary search recursively with my own effort . While calculating middle index, I did int mid = (low + high) / 2. I googled recursive version of binary search and found that the calculation is a bug. Because it may overflow if (low + high) is greater than maximum value of integer. Two types solutions there are.

int mid = low + (high - low) / 2
int mid = (high + low) >>> 1 (unsigned right shift, it does division by 2)

Both of them work like a charm. However, I wonder for first one how it works. Yes, it works but what the logic that the people think to find it is. For the second, how it works. Isn't there possibility to overflow for (high + low) still? I mean my attempt is called as buggy because of overflow but the second one also does (high + low).

shouldnt it be (high+low)>>>1 ? – jackycflau May 16 '18 at 09:14 — jackycflau, May 16 '18 at 09:14
@jackycflau Sorry, fixed, yes – Soner from The Ottoman Empire May 16 '18 at 09:16 — Soner from The Ottoman Empire, May 16 '18 at 09:16

Oleg Sklyar · Answer 1 · 2018-05-16T18:58:01.563

3

The reason the first one works is

it is mathematically identical: l + (h - l)/2 = l + h/2 - l/2 = l/2 + h/2 = (l + h)/2, and
because of the precedence of operations: first the parenthesis, then the division by 2, then the addition.

Funny enough though, this "solution" may still overflow on a large negative low so it is not really a solution to your original problem.

You could do something along the lines of low/2 + high/2 to avoid overflow, but you will need to explicitly write out computations for even-even (the above), even-odd, odd-even and odd-odd to remain in the int space. While doing so you can utilize >>> to avoid the actual division.

edited May 16 '18 at 18:58

answered May 16 '18 at 09:11

Oleg Sklyar

9,834
6
39
62

Thank you. The identical really is helpful. – Soner from The Ottoman Empire May 16 '18 at 09:37
`low/2 + high/2` doesn't sound like a good workaround: consider the case where `low = 1` and `high = 3`: then `low / 2 + high / 2` would give `1` again (assuming integer division). – Mark Dickinson May 16 '18 at 18:46
@MarkDickinson I understand that but was not clear enough expressing the thought. Now corrected, please validate – Oleg Sklyar May 16 '18 at 18:58

harold · Answer 2 · 2018-05-16T09:52:40.007

3

In (high + low) >>> 1, there can still be signed wraparound, but the >>> operator treats its left operand as unsigned so signed wraparound is not relevant. This approach works because there won't be unsigned wraparound: high and low are (as signed integers) non-negative, so as unsigned integers they cannot be large enough to cause unsigned wraparound.

That adding two signed-non-negative integers never suffers from unsigned wraparound can be seen by adding the largest possible numbers: 2^k-1-1 + 2^k-1-1 (where k is the size in bits of the integer type, commonly 32), which adds up to 2^k-2. That doesn't wrap yet: the highest representable number is 2^k-1.

So it is not the "overflow" in high + low that really causes the problem, at least in a reasonable language where signed arithmetic is safe, such as Java. It is the subsequent treatment of the result of high + low as a signed integer by a hypothetical / 2 that follows it that causes trouble.

edited May 16 '18 at 09:52

answered May 16 '18 at 09:20

harold

61,398
6
86
164

I hadn't known _wraparound_ is synonym of _overflow_ before your answer. Thanks for your answer. **However, why cannot unsigned integers be large enough to cause unsigned wraparound?** Could you exemplify it? – Soner from The Ottoman Empire May 16 '18 at 09:37
@snr they're not *exact* synonyms, for example in C signed addition can overflow, but only *maybe* wraps, it can also cause various sorts of breakage. – harold May 16 '18 at 09:49
maybe wraps? **Signed integer variables do not have wrap-around behavior in C language.** https://stackoverflow.com/questions/19842215/wrap-around-explanation-for-signed-and-unsigned-variables-in-c – Soner from The Ottoman Empire May 16 '18 at 09:53
@snr that just means there is no guarantee, not that it definitely doesn't happen – harold May 16 '18 at 09:56
Yes right, **why cannot unsigned integers be large enough to cause unsigned wraparound? Could you exemplify it?** – Soner from The Ottoman Empire May 16 '18 at 09:58
I'm a tad bewildered. Thanks. En passant about wraparound I gotcha from [here](http://qr.ae/TUTMUT). – Soner from The Ottoman Empire May 16 '18 at 10:24

score 1 · Answer 3 · answered May 16 '18 at 19:32

Assume your integers are 16-bit signed (just for brevity; scale them up if you need 32-bit integers).

The range is -32768...32767.

An example where low + (high - low) / 2 will overflow:

low = -20000
high = 30000
high - low = 50000 = (after overflow) -15536
(high - low) / 2 = -7768
low - (high - low) / 2 = -27768

An example where (high + low) >>> 1 will overflow:

low = -20000
high = -10000
low + high = -30000 = (binary) 1000101011010000
(low + high) >>> 1 = (binary) 0100010101101000 = 17768

Calculating average without any possibility for overflow is inefficient, and can be performed in multiple ways:

Convert to higher "precision" first, e.g. 32-bit to 64-bit; convert back after calculating the average

Divide your numbers by 2 and use some sort of "remainder"

low2 = low >>> 1;
high2 = high >>> 1;
low_remainder = low & 1; // get the lowest significant bit
high_remainder = high & 1; // get the lowest significant bit
avg2 = low2 + high2; // this cannot overflow
avg = avg2 + (low_remainder + high_remainder) / 2; // this cannot overflow

I am not sure this exact code works; it's just an idea.

Calculate the average the usual way, then check for "unexpected" result (the average is smaller than low or greater than high). If unusual, calculate it the "safe" way.
Somehow make sure your input cannot cause overflow. This doesn't work for libraries, but often works in real applications, where you have an upper bound like 10000 on all your numbers. Or maybe your numbers are nonnegative; then low + (high - low) / 2 actually cannot overflow.

Thank you for another perspective. _maybe your numbers are nonnegative; then low + (high - low) / 2 actually cannot overflow._ yes it's right because array indexes commence from 0. — Soner from The Ottoman Empire, May 16 '18 at 20:10

Binary Search Middle Value Logic

3 Answers3