10

Hey I had this question in an interview and was wondering what was the best way to solve it. So say you are given an array that is already sorted and you want to find the lowest index of some value x.

Here is a python/pseudocode of what I came up with, I am just wondering if there is a better way to go about it?

def findLowestIndex(arr, x):
     index = binarySearch(0, len(arr), x)
     if index != -1:
         while index > 0:
           if arr[index] == arr[index-1]:
              index -= 1
           else:
              break
     return index

Thanks!

amit
  • 175,853
  • 27
  • 231
  • 333
mike
  • 3,339
  • 6
  • 30
  • 34
  • 3
    I'm presuming they asked you not to use `[1,2,3].index(2)`? Otherwise, any method seems like overkill. – g.d.d.c Apr 13 '12 at 21:59
  • Well I had a couple of different languages I could have written it in so I wanted something not specific to just python. I would imagine that the array.index(x) function is highly optimized but that function cant make any assumption about the state of the array (I know its already sorted) so would a binary search be more effective? – mike Apr 13 '12 at 22:06
  • check the first answer to [this SO question](http://stackoverflow.com/questions/212358/binary-search-in-python) – James Thiele Apr 13 '12 at 22:11

8 Answers8

8

Your method takes linear time in the worst case, which is when the count of xs in the array is O(n).

An O(lg n) solution can be obtained by changing the binary search itself to find the first x in the array instead of just any one of them:

def binary_search(x, a):
    lo = 0
    hi = len(a)

    while lo < hi:
        mid = (lo + hi) // 2

        if a[mid] < x:
            lo = mid + 1
        elif a[mid] > x:
            hi = mid
        elif mid > 0 and a[mid-1] == x:
            hi = mid
        else:
            return mid

    return -1
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
3
import bisect
l = [1,2,3,4,4,4,4,4,5,5,5,5,5,6,6,7,7,7,8,8]
bisect.bisect_left(l, 4)

EDIT: I just miss a thing. the bisect will give you a insert point. so if x is not in the list you will still have a result index. So you need to check if x is in the list first:

if x in l:
    ....

but for interview question, they may want to see how you come up with the algorithm instead of using the library...

wilson
  • 177
  • 1
  • 7
  • This is a good answer, but from what I understand the op wanted to write a method that could be easily implemented in different languages(not one specific to python). Still, +1 – Nolen Royalty Apr 13 '12 at 22:13
  • 1
    Instead of using `x in l`, which negates the efficiency gain from using `bisect_left`, you ought to check that both `0 <= index < len(l)`, and that `l[index] == f`, where index is the result of `bisect.bisect_left`, and f is the "given value". – Casey Kuball May 02 '12 at 14:48
1

If the elements are integers - or enumerated, you can do it a bit faster:

Note that in binary search [the algorithm, not the python function], if an element doesn't exist - you can find the smallest element that is bigger then the index.

  1. first search for x - and get the index, let it be i.
  2. next, search for x-1. If it is not in the list, the binary search can find you the first index if x.
  3. If it is in the list, let the index found be j:
    • Do a binary search on the sublist from j to i, and search for an element such that list[k] < list[k+1]

For not enumerated values, it can also be done by the same idea of decreasing ranges while list[k] < list[k+1] and list[k+1] == x but I find it simpler to understand first how it is done for integers, and then applying it for general solution.

Note that this solution is O(logn), while the trivial solution you propose is O(n), in list with a lot of dupes, because of the iterative step after the binary search.

amit
  • 175,853
  • 27
  • 231
  • 333
1

deferred detection of equality approach in binary search gives the smallest index, reducing equality branches.

def binary_search(low, high, target, array):
    while low < high:
        mid = low + (high - low) / 2
        if a[mid] < target:
            low = mid + 1
        else:
            high = mid

    if (array[low] == target) return low
    else return -1
pepero
  • 7,095
  • 7
  • 41
  • 72
  • From the wiki link, they have an extra check outside of the loop - to handle cases where array is empty. You might add it for completeness. – nawfal Jun 15 '14 at 18:43
1

A recursive version, if anyone's interested...

Rewrite from https://github.com/reneargento/algorithms-sedgewick-wayne/blob/master/src/chapter1/section4/Exercise10.java, a exercise from Algorithms 4th.

def binary_search(key, a, low, high):
    if low > high:
        return -1;
    middle = low + (high - low) / 2;
    if a[middle] < key:
        return binary_search(key, a, middle + 1, high)
    elif a[middle] > key:
        return binary_search(key, a, low, middle - 1)
    else:
        if (middle - 1 >= 0) and (a[middle - 1] != key):
            return middle
        else:
            index = binary_search(key, a, 0, middle - 1)
            if(index == -1):
                return middle;
            else:
                return index;

a = [1,2,3,3,3,3,4,5,6,7,7,8,9]

print(binary_search(7, a, 0, len(a)))

Doesn't recursive version always look more straightforward than non-recursive version? Why does this looks much harder..? Can anyone write a better recursive version :D?

Rick
  • 7,007
  • 2
  • 49
  • 79
0

If x is not in X such that f(x) = v then the answer is trivial: Binary search to find that out.

If there is one x such that f(x) = v then the answer is also trivial: Binary search to find that out.

The problem is only interesting if there are multiple x's such that f(x) = v. If there are a constant number of x's then algorithmically a binary search is optimal. Just binary search and check lower indices sequentially.

What if, though, there are a lot of these x's? A sequential search like that is obviously not optimal. In fact, if there are c * |X| x's then this runs in O(|X|).

Instead what could be done is initialize lbound to 0 and binary search until you find the element, at i, where every time you go right, update lbound to the mid that was just used. Then binary search from [lbound, i - 1]. Do this until i == lbound or you don't find an element. If the former occurs, the desired index is 0. If the latter occurs, the desired index is the previous i used. The worst case is the desired index is 0.

The interesting thing is this still runs in log(|X|) time, I think.

Jesus is Lord
  • 14,971
  • 11
  • 66
  • 97
-1

I'd bet good money that g.d.d.c's comment is the fastest answer for python. Otherwise, your general algorithm is correct, except for the fact that in some cases you can beat the O(log n) behavior of binary search. Specifically, in the case of integers the best worst-case behavior you can get is O(sqrt(log n)): https://stackoverflow.com/questions/4057258/faster-than-binary-search-for-ordered-list

Community
  • 1
  • 1
tel
  • 13,005
  • 2
  • 44
  • 62
  • -1; `list.index` is quite easy to beat with a pure Python binary search on long lists. When searching for the element 6000000 in the list `range(10000000)`, binary search is 20000 times faster then `list.index`, according to `timeit`. – Fred Foo Apr 13 '12 at 22:23
  • @larsmans perhaps that's true, but unlike binary search list.index(x) is guaranteed to return the FIRST element in the list which matches x, which goes back to the op's question. What happens when you time list.index(x) vs the op's posted code? – tel Apr 13 '12 at 22:35
  • The binary search algorithm in my answer, which is what I timed, also finds the first `x`, as does [`bisect.bisect_left`](http://docs.python.org/library/bisect.html#bisect.bisect_left) followed by a check to find out if the element is actually in the list. – Fred Foo Apr 13 '12 at 22:39
-1

Modify binary search to find any occurrence of x for the first time.

user1320006
  • 25
  • 1
  • 6