4

I'm studying binary search algorithm and I've seen many times the algorithm written as follows (this is C++ but the language is not that important here):

    int start = 0;
    int end = vec.size() - 1;       
    do {
        int mid = (lo + hi) / 2;
        if (target < vec[mid])
          start = mid + 1;
        else if (target > vec[mid])
          end = mid - 1;
        else
          // found
    } while (start <= end);

However I've also seen implementations like this:

    int start = 0;
    int end = vec.size() - 1;       
    do {
        int mid = (int)ceil((lo + hi) / 2.0);
        if (target < vec[mid])
          start = mid + 1;
        else if (target > vec[mid])
          end = mid - 1;
        else
          // found
    } while (start <= end);

Both seem to work. Is there any correctness or performance reason why I should get the ceil and do that second case floating point arithmetic instead of using the first version?

Dean
  • 6,610
  • 6
  • 40
  • 90
  • 2
    Note that you can avoid floating-point with `(lo + hi + 1) / 2`. – Oliver Charlesworth Apr 27 '17 at 09:33
  • 1
    This is basically answered here I think: http://stackoverflow.com/questions/27655955/why-does-binary-search-algorithm-use-floor-and-not-ceiling-not-in-an-half-open (thus tempted to mark as duplicate). – Oliver Charlesworth Apr 27 '17 at 09:34
  • 1
    Note that your midpoint formula is *wrong* because `left + right` risks overflow. Better to do: `(int)(((unsigned)lo + (unsigned)hi) / 2)` which gives you the desired two's-complement wrap-around behavior. (See also [this answer](https://codereview.stackexchange.com/questions/152085/bisection-method/152136#152136), though it is in the context of C#. Plenty of information about this available online via Google.) – Cody Gray - on strike Apr 27 '17 at 09:38
  • 1
    @CodyGray: Actually, making **indices** `unsigned` (or better yet, `size_t`) *right from the start* would be the logical thing to do... – DevSolar Apr 27 '17 at 09:43
  • Kaidul's answer is correct, but see http://stackoverflow.com/questions/39416560/how-can-i-simplify-this-working-binary-search-code-in-c/39417165#39417165 – Matt Timmermans Apr 27 '17 at 12:14

1 Answers1

3

When int mid = (lo + hi) / 2:

You are deciding the mid element by taking the left element of the two potential middle elements when the array size between [left, right] is odd i.e. for array [4, 5] your mid will be 4. So without any ceil of floor, the division works pretty like floor.

When (int)ceil((lo + hi) / 2.0);:

You are deciding the mid element by taking the right element of the two potential middle elements when the array size between [left, right] is odd i.e. for [4, 5] your mid will be 5.

So both selection will work because you're discarding/taking a part based on some valid conditions (target < vec[mid] and target > vec[mid]), The partition point won't matter here that much.

Another thing is, during operation like int mid = (lo + hi) / 2 you may encounter overflow when adding lo and hi if the summation exceeds integer range. So safe is to write like mid = lo + (hi - lo) / 2 which will yield same output.

Hope it helps!

Edit

so both work only because I'm discarding the mid element from the new search range when restarting the search, right?

Yes. If you wouldn't discard the mid element, it will fall into infinity loop i.e. [4, 5], 4 would be always selected as mid and for call like left = mid, it would create an infinite loop.

Kaidul
  • 15,409
  • 15
  • 81
  • 150
  • so both work only because I'm **discarding** the mid element from the new search range when restarting the search, right? (otherwise there might be an infinite loop) – Dean Apr 27 '17 at 09:49