3

Here, I have an array:

{1, 3, 5, 6, 7, 8, 9, 11, 13, 14, 15}

Simple enough. However, we will be using a binary search to search for which index the integer 3 is on. How many comparisons would it take the computer?

See, I think it's three comparisons, but somehow that is incorrect. Would anyone kindly explain how many comparisons the computer takes, and why? I'm relatively new to programming, and I'm not grasping the concept very well.

3 Answers3

4

A binary search algorithm on that sequence would proceed as follows. So, we're looking for 3, we compare the element in the middle of the sequence with 3.

{1, 3, 5, 6, 7, 8, 9, 11, 13, 14, 15}
                =?
                3

They are not equal, so take the left sub-sequence and we compare 3 against its middle element, that is

{1, 3, 5, 6, 7}
       =?
       3

They are still not equal, then we go to the left sub-sequence, that is

{1, 3}

and we compare against the middle element, but we have a list of size 2! If we choose as middle element 1, then we would need to make another comparison, i.e. we would need to recurse on the right sub-sequence, i.e. {3}. In that case we would have 4 comparisons!

But wait, there's also another trick, you also need to check for the base case at every iteration or recursion call and these account for other comparisons!

nbro
  • 15,395
  • 32
  • 113
  • 196
2

A binary search relies on the fact, that the array is already sorted.

You take the element in the middle, which you can compute with mid = (left + right) / 2 and compare array[mid] with the element you are searching, 3 in this case.

  • If array[mid] is greater than 3, 3 must be on the left.

  • If it is smaller, then 3 will be on the right, if it is contained in the array.

This yields a new smaller interval with half the size.

You half the size till you reach an interval of size 1. These are log n steps, log n being the number of times you can divide n by 2 until you reach 1.

Take a look at this answer, which explains the mathematical details. It is important to note that the used logarithm has the base of 2.

Community
  • 1
  • 1
mike
  • 4,929
  • 4
  • 40
  • 80
1

The number of comparisons is higher than a simple binary subdivision of a range of numbers, because you need to check for <, = and > the current node.

Pseudocode:

for node, 

    if x < node, search left
    else if x > node, search right
    else x == node (stop searching)

It depends on what order you do your checks. The program needs to check up to two comparisons per node. Since x == node for only one node ever it makes the most coding sense to put that one as the default condition (because of branch prediction, you want more likely branches to be tested first).

So on the first pass, node 8, it would find (x < 8) true, the move to node 5, find (x < 5) true, then move to the left.

Now, what's to the left could be either 1 or 3 (since it's an unfilled subtree, it could be either). If it's 1, then move to node 1 find (x < 1) false, then find (x > 1) true, then move to node 3.

At this stage we're at node 3. We find (x < 3) false, then check for (x > 3), also false, thus deducing x == 3. Shortest possible path is 4 checks, but if we had to stop by node 1 (equally likely) we did this is 6 checks.

Note, that I deliberately biased this for checking (x < node) first, and if you check them in a different order then the average comparison count would shoot up. e.g. if you check equality first, then you add additional unneeded checks:

for node, 

    if x == node (stop searching)
    else if x < node, search left
    else x > node, search right

So on the first pass, node 8, it would find (x == 8) false, then find (x < 8) true, then find (x == 5) false, then find (x < 5) true, then (50% chance) go to node 1, find (x == 1) false, then find (x < 1) false, then find (x == 3) true. So either 5 or 7 checks this way. If you search right branches first, then you get a different measurement.

Jason Lang
  • 1,079
  • 9
  • 17