28

I'm tinkering with some code and I realized something I never knew. A normal binary search will return a random index in a data set for a key that occurs more than once. How can I modify this code below to return the first occurrence? Is this something people do?

//ripped from the JDK
public static int binarySearchValue(InvertedContainer.InvertedIndex[] a, long key) {
    return bSearchVal(a, 0, a.length, key);
}

private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex,
                                 int toIndex, long key) {
    int low = fromIndex;
    int high = toIndex - 1;

    while (low <= high) {
        int mid = (low + high) >>> 1;
        long midVal = a[mid].val;

        if (midVal < key)
            low = mid + 1;
        else if (midVal > key)
            high = mid - 1;
        else
            return mid; // key found
    }
    return (low); // key not found. return insertion point
}
Lazer
  • 90,700
  • 113
  • 281
  • 364
Amir Afghani
  • 37,814
  • 16
  • 84
  • 124
  • 1
    Cool - I get to rep-whore my own question and answer... http://stackoverflow.com/questions/4948162/how-can-i-better-understand-the-one-comparison-per-iteration-binary-search explains a form of the binary search that can find the first item > or >=, or the last item < or <=. –  Jul 13 '11 at 08:57
  • Hah, thanks! I'll take a look. Every once in a while I notice something like this and I think 'you know nothing'. – Amir Afghani Jul 13 '11 at 08:59

16 Answers16

60

An addition to Jon Skeets post:

The potential faster implementation is actually not hard to implement and adds only 2 lines of code, here is how I'd do it:

    if (midVal < key)
        low = mid + 1;
    else if (midVal > key)
        high = mid - 1;
    else if (low != mid) //Equal but range is not fully scanned
        high = mid; //Set upper bound to current number and rescan
    else //Equal and full range is scanned
        return mid;
bezmax
  • 25,562
  • 10
  • 53
  • 84
19

Having found a matching value, you basically need to walk up the collection until you find an entry which doesn't match.

You could potentially make it faster by fetching the index of a key immediately lower than the one you were looking for, then do a binary chop between the two - but I'd probably go for the simpler version, which is likely to be "efficient enough" unless you've got a really large number of equal entries.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Hi Jon - so I need a while loop in the else clause? I understand the idea, but it seems like it would look kludgish, no? – Amir Afghani Jul 13 '11 at 08:55
  • BTW, I'm a big fan. Thanks for your response so far. – Amir Afghani Jul 13 '11 at 08:56
  • 1
    @Amir: That's certainly one way of doing it. Another is to leave the method as it is (preferably with a name change :) but offer *another* method which will find the first one, by calling the original method and *then* performing the while loop (if there was no match). Note that in APIs I've used, the insertion point is returned as a negative value (using `~`) to indicate a failure to find the value, but also indicating the insertion point at the same time. – Jon Skeet Jul 13 '11 at 08:59
  • 2
    This solution has O(N) time complexity as there can be up to N elements with the value you are searching for. – Juan Martinez Jan 06 '15 at 17:30
  • 1
    @JuanMartinez: Hence the "efficient enough unless you've got a really large number of equal entries" bit at the end of the answer. – Jon Skeet Jan 06 '15 at 18:15
5

If your data is all integral, then this hack can help. It uses a float array to store values.

float array[];    //contains all integral values
int searchValue;

int firstIndex = -(binarySearch(array, (float)searchValue - 0.5F) + 1);

Basically what it does is find the insertion index of a value in between your search value and the integer before it. Since all values are integral, it finds the first occurrence of the search value.

Also this runs is log(n) time.

Example:

import java.util.Arrays;

public class BinarySearch {
    // considering array elements are integers
    float ar[] = new float[] { 1, 2, 3, 3, 4, 4, 5, 9, 9, 12, 12 };

    public void returnFirstOccurrence(int key) {
        int firstIndex = -(Arrays.binarySearch(ar, key - 0.5F) + 1);
        if (ar[firstIndex] != key)
            System.out.println("Key doesn't exist");
        else
            System.out.println("First index of key is " + firstIndex);
    }

    public static void main(String Args[]) throws Exception {
        new BinarySearch().returnFirstOccurrence(9);
    }

}

OUTPUT: 7

p.s: I've used this trick on several coding contests and it nicely worked everytime.

Arnab
  • 555
  • 1
  • 7
  • 12
  • Could you explain? How does this get the first index of occurrence? – nawfal Jun 15 '14 at 21:53
  • Binary search implementation in java Collections either returns the index of the number OR if the number is not present it returns the index of position where the number can be inserted. see [link](http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#binarySearch(float[],%20float)). Also I've edited to include an example. – Arnab Jul 15 '14 at 09:51
  • 1
    got it. It's not just hacky but extremely hacky :) Not only that it works only for integers, but still you would want a `float[]` to hold them all. In case the client originally has an `int[]`, then creating a new `float[]` might have some cost. You might want to make the two conditions **written in bold** :) – nawfal Jul 15 '14 at 10:17
  • 1
    As I said, I only use it in contests. I'm a student and don't have any industrial experience but I agree it would be very inappropriate and utterly confusing to use it in a production code. – Arnab Jul 23 '14 at 11:12
  • no to me, using this kind of solution in production is better than using some ad hoc algo on an int[] ... there is an O(n) conversion cost if you do this once but if you create the array as float[] from the beginning and maintain it during lifetime of your program and run the search many times, there will be no significant performance diff. – mcvkr Jun 28 '21 at 13:51
  • There are two issues with your example code. #1 If `firstIndex == ar. length` (because `key > ar[ar.length-1]`) this throws an exception. #2 `ar[firstIndex] != key` is risky as you shouldn't be comparing floats with `==` or `!=` – Ricola Jun 11 '22 at 16:38
3

You could implement "lower bound" algorithm instead of binary search. This algorithm is used e.g. in C++/STL and its transcript into Java is straightforward. The algorithmic complexity of lower bound is also O(log n) as the binary search. This is better than to use binary search first and than linearly search for the first matching element - this would have worst case behaviour O(n).

Jiri Kriz
  • 9,192
  • 3
  • 29
  • 36
  • 1
    Although this is called a lower bound in the C++ library, it's still a binary search - at least according to my old copy of Algorithms and Data Structures (Modula 2 Edition) by Niklaus Wirth. Maybe it's a matter of opinion, though, where the boundary is between "different algorithm" and "variant of the same algorithm". –  Jul 13 '11 at 09:09
  • "Binary search" as implemented by many (most?) libraries (e.g. C, C++/STL, Java) does not specify which index is returned when multiple keys are present. This was also the problem in the posed question. "Lower bound" specifies exactly the result. – Jiri Kriz Jul 13 '11 at 09:21
  • Agreed, but good names for library functions aren't necessarily the same as in the algorithms textbook, especially when the library may have several variants. BTW, I didn't mean to claim you're wrong about anything, and I upvoted your answer. –  Jul 13 '11 at 09:45
3

You can an adapt your existing search algorithm just by having a sharper definition of matching. You can tell that the highlighted 5 in the sequence 1,3,5,5,5,9 is the first one because the number before it (3) is smaller than 5. So if mid lands on array element equal to the the key you only treat it as a match if a[mid-1] is less than key, other equal array elements are treated like greater than elements. Now you algorithm becomes (after including Jon Skeet's suggestion to return negatives for insertion points):

public static int binarySearch(int[] a, int key) {
    int low=0,high=a.length-1;
    while (low<=high) {
        int mid=(low+high) >>> 1;
        int midVal=a[mid];
        if (midVal < key) 
            low=mid+1;
        else if (mid>0 && a[mid-1]>=key) //we already know midval>=key here
            high=mid-1;
        else if (midVal==key) //found the 1st key 
             return mid;
        else
            return ~mid;      //found insertion point
    }
    return ~(a.length);       //insertion point after everything
}

It uses more comparisons but went faster than Stev314's version in my benchmarks probably because of cache effects.

paperhorse
  • 4,095
  • 2
  • 22
  • 12
1

here is the solution, i found for getting the lower index of key having multiple occurrences in a sorted array using binary search.

int lowerBound(int[] array,int fromIndex, int toIndex, int key)
{
    int low = fromIndex-1, high = toIndex;
    while (low+1 != high)
    {
        int mid = (low+high)>>>1;
        if (array[mid]< key) low=mid;
        else high=mid;
    }
    int p = high;
    if ( p >= toIndex || array[p] != key )
        p=-1;//no key found
    return p;
}

we have to change a little in this code to work for upper bound, using binary search, so here is the working copy of code.

 int upperBound(int[] array,int fromIndex, int toIndex, int key)
{
    int low = fromIndex-1, high = toIndex;
    while (low+1 != high)
    {
        int mid = (low+high)>>>1;
        if (array[mid]> key) high=mid;
        else low=mid;
    }
    int p = low;
    if ( p >= toIndex || array[p] != key )
        p=-1;//no key found
    return p;
}
rykhan
  • 309
  • 4
  • 15
1

One approach is to hold an invariant throughout the whole binary search. In your particular case, the invariant would be:

  • array[low] < key
  • key <= array[high]

Then you can minimize the gap between low and high using binary search. When low + 1 == high, high would be the answer. Example code in C++:

// check invariant on initial values.
if (array[low] >= key) return low;
if (array[high] < key) return high+1;
// low + 1 < high ensures high is at least low + 2, thus
// mid will always be different from low or high. It will
// stop when low + 1 == high.
while (low + 1 < high) {
  int mid = low + (high - low) / 2;
  if (array[mid] < key) {
    low = mid;   // invariant: array[low] < key
  } else {
    high = mid;  // invariant: array[high] >= key
  }
}
return high;

Key difference between this and your example code is to update low and high to only mid rather than mid+1 or mid-1, because we have checked the value of array[mid], we can guarantee the invariant still holds when updating the boundaries. You need to check the invariant on initial values before starting to search too.

pherl
  • 11
  • 2
1

In this thread, you can find a full example of the binary search (recursive version), and two other versions (based on the original one) which allow you to get the first index and last index of a given key.

For your convenience I added the relevant Junit tests.

Naor Bar
  • 1,991
  • 20
  • 17
1

The following algorithm binary-searches for the first item with a key greater-than-or-equal-to your search key...

while (upperbound > lowerbound)
{
  testpos = lowerbound + ((upperbound-lowerbound) / 2);

  if (item[testpos] >= goal)
  {
    //  new best-so-far
    upperbound = testpos;
  }
  else
  {
    lowerbound = testpos + 1;
  }
}

This isn't written for Java, which I don't know that well, so it may need a minor tweak. Note that the bounds are half-open (lowerbound is inclusive and upperbound is exclusive) and that this is important for correctness.

This can be adapted to other similar searches - e.g. finding the last key <= the search value.

This is slightly modified from my earlier question-and-answer here.

Community
  • 1
  • 1
  • 1
    while it's trivial to extend, just want to point out that OP wants the first occurrence "equal to", not "greater than or equal to". – nawfal Jun 15 '14 at 20:41
0

Here's a variation of the solution in scala. Used pattern-matching and recursion instead of the while loop to get the first occurrence.

def binarySearch(arr:Array[Int],key:Int):Int = {
     def binSearchHelper(lo:Int,hi:Int,mid:Int):Int = {
        if(lo > hi) -1 else {
            if(arr(mid) == key) mid else if(arr(mid) > key){
                binSearchHelper(lo,mid-1,lo + (((mid-1) - lo)/2))
            }else{
                binSearchHelper(mid+1,hi,(mid+1) + ((hi - (mid+1))/2))
            }
        }
     }
    binSearchHelper(0,arr.size-1,(arr.size-1)/2)
}

def findFirstOccurrence(arr:Array[Int],key:Int):Int = {
    val startIdx = binarySearch(arr,key)
    startIdx match {
        case 0 => 0
        case -1 => -1
        case _ if startIdx > 0 => {
            if(arr(startIdx - 1) < key) startIdx else {
                    findFirstOccurrence(arr.slice(0,startIdx),key)
            }
        }
    }
}
sc_ray
  • 7,803
  • 11
  • 63
  • 100
0

This should do the trick

private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex,
                             int toIndex, long key) {
int low = fromIndex;
int high = toIndex - 1;
int result = low;
while (low <= high) {
    int mid = (low + high) >>> 1;
    long midVal = a[mid].val;

    if (midVal < key)
        low = mid + 1;
    else if (midVal > key)
        high = mid - 1;
    else
    {
        result = mid;
        high = mid -1; 
    }
}
return result; 

}

Praveen Reddy Katta
  • 183
  • 1
  • 1
  • 11
0

For the last occurrence of the element:

static int elementExists(int input[], int element){
    int lo=0;
    int high = input.length-1;
    while(lo<high){
        int mid = (lo + high )/2;
        if(element >input[mid] ){
            lo = mid+1;
        }
        else if(element < input[mid]){
            high= mid-1;
        }
        else if (high != input.length-1) //Change for the Occurrence check
            lo = mid;
        else {
            return mid;
        }
    }
    return -1;
}

For the first occurrence:

else if (lo != mid){
        high = mid;
}

0

I think a simpler approach is storing the latest mid index where xs[mid] == key into a result variable and then keep running the binary search.

Here's the swift code:

func first<T: Comparable>(xs: [T], key: T) -> Int {
    var lo = xs.startIndex
    var hi = xs.endIndex - 1
    var res = -1
    while lo <= hi {
        let mid = lo + (hi - lo) >> 1
        if xs[mid] == key { hi = mid - 1; res = mid }
        else if xs[mid] < key { lo = mid + 1}
        else if xs[mid] > key { hi = mid - 1 }
    }

    return res
}

Also, this requires a really small change (just one line) if you were to find the last index of a key.

func last<T: Comparable>(xs: [T], key: T) -> Int {
    var lo = xs.startIndex
    var hi = xs.endIndex - 1
    var res = -1
    while lo <= hi {
        let mid = lo + (hi - lo) >> 1
        if xs[mid] == key { lo = mid + 1;  res = mid }
        else if xs[mid] < key { lo = mid + 1}
        else if xs[mid] > key { hi = mid - 1 }
    }

    return res
}
HepaKKes
  • 1,555
  • 13
  • 22
0

Try this javascript recursive solution. It's optimal in a sense that it's O(log(N))

function solve(A, e) {
  function solve (A, start, end, e, bestUntilNow) {
    if (start > end) {
      if (A[start] === e)
        return start
      return bestUntilNow
    } else {
      const mid = start + Math.floor((end - start) / 2)
      if (A[mid] === e) {
        return solve(A, start, mid - 1, e, mid)
      } else if (e < A[mid]) {
        return solve(A, start, mid - 1, e, bestUntilNow)
      } else {
        return solve(A, mid + 1, end, e, bestUntilNow)
      }
    }
  }
  return solve(A, 0, A.length, e, -1)
}
GA1
  • 1,568
  • 2
  • 19
  • 30
0

Given a sorted array with possibly duplicated elements like [1,1,2,2,2,2,3], the following code with time complexity O(logn) finds the indexes of the first and last occurrences of an element in the given array. This approach is based on a recursive JS Binary Search implementation by comparing with the immediately lower or immediately higher index/element after matching initially the element (as it is also suggested below).

// Find the first occurence of the value using binary search
function binarySearchFirstOccurence(arr, left, right, value) {
  let middle = Math.floor((left+right)/2);
  if (left > right) {
    return -1;
  } else if (arr[middle] === value && (arr[middle-1] < value || middle === 0)) {
    return middle;
  } else if (arr[middle] < value) {
    return binarySearchFirstOccurence(arr, middle + 1, right, value);
  } else {
    // Going lower
    return binarySearchFirstOccurence(arr, left, middle - 1, value);
  }
}

// Find the last occurence of the value using binary search
function binarySearchLastOccurence(arr, left, right, value) {
  let middle = Math.floor((left+right)/2);
  if (left > right) {
    return -1;
  } else if (arr[middle] === value && (arr[middle+1] > value || middle === arr.length - 1)) {
    return middle;
  } else if (arr[middle] > value) {
    return binarySearchLastOccurence(arr, left, middle - 1, value);
  } else {
    // Going higher
    return binarySearchLastOccurence(arr, middle + 1, right, value);
  }
}

function sortedFrequency(arr, value) {
  let left = 0;
  let right = arr.length -1;
  let first = binarySearchFirstOccurence(arr, left, right, value);
  let last = binarySearchLastOccurence(arr, left, right, value);
  if (first === -1 && last === -1) {
    return -1;
  } else if (first === -1 && last !== -1) {
    return 1;
  } else {
    return last-first+1;
  }
}

let arr = [1,1,2,2,2,2,3];
console.log(sortedFrequency(arr, 3));

Best George

0

The solution is doing some changes in the binary search so that time complexity remains equal to O(log2n). Keep in mind that for binary search the array must be in sorted form. Following are the steps

  • Calculate the mid index
  • If the element to be searched is greater than element present at the mid index then low becomes mid + 1
  • If the element to be searched is smaller than element present at the mid index then high will become mid - 1.
  • Now lets make some changes. Suppose if two elements are same in an array so our aim is to return the first occurrence of that element. Case would be if mid index == 0 or arr[mid - 1] != arr[mid], then mid index is returned. This means that same element is not repeated and if arr[mid - 1] == arr[mid], then one will make high = mid - 1. The reason being one need to return the previous element as it is the first occurrence.

Code

def first_occurence_binary_search(arr, search_element):

n = len(arr)
low = 0
high = n - 1

while low <= high:

    mid = (low + high) // 2
    if arr[mid] > search_element:
        high = mid - 1

    elif arr[mid] < search_element:
        low = mid + 1

    else:
        if mid == 0 or arr[mid - 1] != arr[mid]:
            return mid
        else:
            high = mid - 1

return -1


array = [1, 3, 3, 3]
searching = 3
print(first_occurence_binary_search(arr=array,
                                    search_element=searching))
Sanpreet
  • 103
  • 8