3

Background

An interactive book presents this function as an example of a binary search.

void GuessNumber(int lowVal, int highVal) {
   int midVal;            // Midpoint of low and high value
   char userAnswer;       // User response
   
   midVal = (highVal + lowVal) / 2;
   
   // Prompt user for input
   cout << "Is it " << midVal << "? (l/h/y): ";
   cin >> userAnswer;
   
   if( (userAnswer != 'l') && (userAnswer != 'h') ) { // Base case: found number
      cout << "Thank you!" << endl;                   
   }
   else {                                             // Recursive case: split into lower OR upper half
      if (userAnswer == 'l') {                        // Guess in lower half
         GuessNumber(lowVal, midVal);                 // Recursive call
      }
      else {                                          // Guess in upper half
         GuessNumber(midVal + 1, highVal);            // Recursive call
      }
   }
}

It presents the algorithm, and then they give an explanation about how to calculate the mid-value for the recursive call.

Because midVal has already been checked, it need not be part of the new window, so midVal + 1 rather than midVal is used for the window's new low side, or midVal - 1 for the window's new high side. But the midVal - 1 can have the drawback of a non-intuitive base case (i.e., midVal < lowVal, because if the current window is say 4..5, midVal is 4, so the new window would be 4..4-1, or 4..3). rangeSize == 1 is likely more intuitive, and thus the algorithm uses midVal rather than midVal - 1. However, the algorithm uses midVal + 1 when searching higher, due to integer rounding. In particular, for window 99..100, midVal is 99 ((99 + 100) / 2 = 99.5, rounded to 99 due to truncation of the fraction in integer division). So the next window would again be 99..100, and the algorithm would repeat with this window forever. midVal + 1 prevents the problem, and doesn't miss any numbers because midVal was checked and thus need not be part of the window.

Reasoning

I can see why, when the mid-value is used as a lower limit, the function is called as GuessNumber(midVal + 1, highVal). The explanation given with the limits 99 and 100 is pretty clear. However, I don't understand why, when the mid-value is used as the highest limit, the function is called as GuessNumber(lowVal, midVal) instead of GuessNumber(lowVal, midVal - 1).

The algorithm is missing the case when the value being searched is not within the range. However, it seems that they do make the assumption that it is (as a precondition). Therefore, the example they give with 4 and 5 does not make much sense.

Test case: the number being searched is 4

Let's assume that the value being searched is 4.

mid_value := (4+5) / 2 = 9 / 2 = 4.5 = 4 (due truncation) 

When the number is checked, it should return the position of 4, so there would be no error. The call GuessNumber(4, mid_value - 1) would have never been called. This means that the case for midVal < lowVal would have never occurred.

Test case: the number being searched is 4

Now, suppose the value is 5. The same calculation is done. When compared, the algorithm will execute the call GuessNumber(mid_value + 1, 5). This should return the position of 5. Again, GuessNumber(5, mid_value - 1) is not called.

Test case: changing the range

If I try to increase the range, let's say using 4 and 7 as limits, the function would never cause midVal < lowVal if called like GuessNumber(low_value, mid_value - 1). Consider the mid-value of the range between 4 and 7, namely 5 (due truncation). If the number being searched is 5, then the position is immediately returned. However, if the number being searched is 4 and the recursive call is done as GuessNumber(low_value, mid_value - 1) (GuessNumber(4, 5 - 1)), then the new mid-value will be 4 and no midVal < lowVal would occur. The postion of 4 is returned.

Some conclusions

I think it may be a logical error. The only way this could happen is if the number being searched is outside of the range (and specifically below the lower limit), but the algorithm is not testing a case when the number being searched is outside of the range. Again, it seems to be a precondition. Nevertheless, the explanation given caught my attention. They took the time to say that the error midVal < lowVal could happen, and they gave the example of the range 4 and 5.

Other Findings

I looked up the pseudocode in a discrete math book, but they use the case of recursive_binary_search(lowVal, midVal - 1) without worrying about the issue described above. I noticed they do some checking if the value is out of range, though.

procedure binary_search(i, j, x: integer, 1 ≤ i ≤ j ≤ n)
m := ⎣(i + j)/2⎦
if x = am then
    return m
else if (x < am and i < m) then
    return binary_search(i, m-1, x)
else if (x > am and j > m) then
    return binary_search(m + 1, j, x)
else return 0
{output is location of x in a1, a2, ..., an if it appears; otherwise it is 0}

I also saw this implementation in another data structures book. This does not make the precondition that the item being searched is inside the range, but they do check for that, and they still call the recursive function with the limits lower (first in this example) and mid - 1 (loc - 1 in this example).

void recBinarySearch(ArrayType a, int first, int last, ElementType item, bool &found, int &loc) {
    /*---------------------------
      Recursively search sub(list) a[first], ..., a[last] for item using binary search.

      Precondition: Elements of a are in ascending order; item has the same type as the array elements.
      Postcondition: found = true and loc = position of item if the search is successful; otherwise, found is false.
    -----------------------------*/

    if (first > last)
        found = false;
    else
    {
        loc = (first + last) / 2;
        if (item < a[loc])       // First half
            recBinarySearch(a, first, loc - 1, found, loc);
        else if (item > a[loc])  // Second half
            recBinarySearch(a, loc + 1, last, found, loc);
        else
            found = true;
    }
}

Question

I have searched on Google and other StackOverflow questions, but I cannot find something that points me in the right direction (most of the findings explain the overflow issue in the mid-value calculation, which is not the issue here). Is the explanation given in the book about using mid-value instead of mid-value - 1 for the upper limit correct? Is there an example that could demonstrate so, or am I missing something?

Thank you in advance for you time and help!

  • The code in question involves user input in a guessing game. Not an actual binary search algorithm. While I’m no CS professor, I imagine the `binary search` algorithm is sound and the guessing game just introduces complications you won’t get into with a straight algorithm – Taekahn Feb 24 '22 at 01:33
  • `mid - 1` is problematic to use in the case where the range is 0..1 and the indices are unsigned. In such a scenario, `mid` will be zero and if it's chosen as the upper bound, then `mid - 1` will wrap around to a huge value. – paddy Feb 24 '22 at 01:34
  • On a related note, `mid = (first + last) / 2` is naive and will not allow the full range of possible indices to be searched. For most people that might be okay, but it is technically limiting the algorithm and allowing very nasty bugs if that's ever exceeded. For `int`, the calculation limits the maximum size to `INT_MAX / 2`; for `unsigned int` it's limited to `UINT_MAX / 2`. In other words, only half of the available positive integers for the data type. The full range can be searched by calculating it as `mid = first + (last - first) / 2;` -- same as how you'd do it using pointers. – paddy Feb 24 '22 at 01:43
  • Thank you, @paddy There is a link ("overflow issue in the mid-value calculation") that discusses the issue you describe. Also, suppose the range is 0..1 and it is a precondition that the value must be within the range. then the mid value would be 0 (as you said). The recursion call is with mid - 1 is done after the mid value is checked. If it is 0, then the position is return, if it is 1, then the call will be done with mid + 1 and 1 (resulting in finding 1). Does this make sense? – Rodolfo Andrés Rivas Matta Feb 24 '22 at 01:56
  • @Taekahn I noticed it is like an introductory example, but what caught my attention was the explanation that we should use "mid" instead of "mid - 1" for recurrent calls. I think that explanation is flawed. Right? – Rodolfo Andrés Rivas Matta Feb 24 '22 at 01:58
  • It’s flawed when executing a normal binary search. It’s not flawed if you leave the choice up to user input. This is really a guess as to the writer’s intent though, but that’s what I’m seeing. – Taekahn Feb 24 '22 at 02:00
  • It is flawed in the sense that, because of that inclusion, one could answer *deceptively* at the right time and still ultimately arrive at guessing the number. Real binary-search always excludes the incorrect probed value (and up to half the partition in which it bookends). therefore, such a deception could not go unpunished. – WhozCraig Feb 24 '22 at 02:01
  • The guessing game is not really like a regular binary search. It is showing you how to modify the _indices_, assuming a 1:1 correspondence between the index and the value stored there. In that respect, your number _always_ existed in the search range somewhere. Or it _never_ existed there in the first place, which means you were simulating an out-of-bounds index. The logic presented in that example is probably tailored to this fact. It's a strange thing to warrant such a lengthy question. – paddy Feb 24 '22 at 02:06
  • I see you guys make good points, but assume that the value being searched is within range and that the user is irrelevant (he is always answering honestly). Would it still make sense using "mid" instead of "mid - 1" in the next recursive call? If so why? Why not just "mid"? Should I edit the question clarify that? – Rodolfo Andrés Rivas Matta Feb 24 '22 at 02:13
  • 1
    In a standard binary search, you generally check to see if your low has passed your high and you terminate your search if it does. So, no, it doesn't make sense to specifically code a scenario to ensure the high and low never cross _in the scenario you have detailed in your last comment_. At least, not to me, it doesn't. You can see this behavior (checking low < high) in the last example you posted. – Taekahn Feb 24 '22 at 03:06

1 Answers1

1

You're right to be confused by this example. With a range of 4..5, the guess (midVal) would be 4. The only way the line of code GuessNumber(lowVal, midVal-1); would be executed is if the user answered "low" which is:

  1. a lie, or
  2. their number is out of range.

The example code doesn't account for search values outside the initial input range, which a binary search should do.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880