8

Why does the midpoint algorithm for Binary Search use

low + (high-low)/2

rather than

(low + high)/2
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Intrepid Diamond
  • 462
  • 1
  • 6
  • 16
  • 2
    Because of this: [Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken](http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html) – Jeremy Mar 01 '16 at 03:57
  • 5
    Less chance of overflow. If the indices `high` and `low` are positive (non-negative), then the `low + (high - low) / 2` won't overflow, whereas `(high + low) / 2` can. OTOH, if the values can be positive or negative, for large enough values of different signs, you get overflow with `low + (high - low) / 2` and no overflow with `(low + high) / 2`. So, it pays to choose carefully. – Jonathan Leffler Mar 01 '16 at 03:59
  • 4
    Mind you, if you're talking about actual Python level code (not C implementation code), none of that really matters. Python has arbitrary precision integer math; they'll both work with no risk of overflow. – ShadowRanger Mar 01 '16 at 04:07
  • @ShadowRanger - There is a concept of infinity, though, and math doesn't work too well with it – OneCricketeer Mar 01 '16 at 04:13
  • @cricket_007: If either `high` or `low` are infinite, you're already in trouble. If `high` or `low` is so large that adding them blows main memory, you were already screwed. – ShadowRanger Mar 01 '16 at 04:21
  • 1
    I have a feeling this was written for C and ported to python without thinking. Both expressions give the same result, and the second one is actually faster. – Arif Burhan Mar 01 '16 at 04:46

2 Answers2

4

Your question is tagged for python, so I'll answer for python. In short, it doesn't:

https://hg.python.org/cpython/file/2.7/Lib/bisect.py

The pythonic implementation above found in the docs uses the latter construction. As people in the comments have pointed out, some languages need to respect overflow. Python isn't none of them and has arbitrary precision integers.

In the comments it was speculated that someone porting from a C-like language might copy the more acceptable construction for that language. This is possible. Someone else commented that one might be faster than the other; such a micro-optimization seems to be difficult to comment on in general.

But... what if they aren't Ints!

I have assumed that these are integers because for binary search, the indices are integers. If they are indeed not integers, then you are going to have some problems using them to access arrays. But in the mean time, you might experiene different results:

a = b = sys.float_info.max
print a + (a-b)/2 # prints a really big number
print (a+b)/2 # prints inf

Similarly,

a = b = float("inf")
print a+(a-b)/2 # prints nan
print (a+b)/2 # prints inf

That latter example is different, albeit it isn't clear to me which is better. For why this occurs, you can look at the overflow explanations in the article linked above.

Community
  • 1
  • 1
en_Knight
  • 5,301
  • 2
  • 26
  • 46
0

I searched this question on google and found very interesting answer on http://googleresearch.blogspot.in/2006/06/extra-extra-read-all-about-it-nearly.html

Here is example:

1:     public static int binarySearch(int[] a, int key) {
2:         int low = 0;
3:         int high = a.length - 1;
4:
5:         while (low <= high) {
6:             int mid = (low + high) / 2;
7:             int midVal = a[mid];
8:
9:             if (midVal < key)
10:                 low = mid + 1
11:             else if (midVal > key)
12:                 high = mid - 1;
13:             else
14:                 return mid; // key found
15:         }
16:         return -(low + 1);  // key not found.
17:     }

The bug is in this line:

int mid =(low + high) / 2;

In Programming Pearls Bentley says that the analogous line "sets m to the average of l and u, truncated down to the nearest integer." On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (231 - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.

This bug can manifest itself for arrays whose length (in elements) is 230 or greater (roughly a billion elements). This was inconceivable back in the '80s, when Programming Pearls was written, but it is common these days at Google and other places. In Programming Pearls, Bentley says "While the first binary search was published in 1946, the first binary search that works correctly for all values of n did not appear until 1962." The truth is, very few correct versions have ever been published, at least in mainstream programming languages.

So what's the best way to fix the bug? Here's one way:

 int mid = low + ((high - low) / 2);
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Piyush Gupta
  • 2,181
  • 3
  • 13
  • 28
  • This question is tagged python... How does your answer apply (I'm not being facetious, it might apply)? – en_Knight Mar 01 '16 at 05:05
  • @en_Knight Read question carefully, he want to explanation of the Midpoint Algorithm. and here is good explanation. – Piyush Gupta Mar 01 '16 at 05:09
  • Hmm I did read the question, but it is also tagged with python right now (12:10am eastern time, the 1st); since python doesn't have restricted precision for integers, aren't some the reasons described in your post not applicable? – en_Knight Mar 01 '16 at 05:11
  • Jeremy Banks linked to this article an hour before you posted your answer. Your answer is basically a link and a verbatim quote from the article — probably, more of the answer should be quoted than the edit I've made. Another answer, with less quote, was deleted about 50 minutes before you answered — granted, you can't see that. Are you sure this is a good idea? – Jonathan Leffler Mar 01 '16 at 05:11