1

Given a sorted array A and an element x, I need to find an algorithm that returns the index of x in A or -1 if x is not in A. the time complexity of the algorithm should be Θ(logd) when d is the number of elements that appears before x in A, or if x is not in A, d is the number of elements that were before x if he was in A.

Binary search is not good enough because its best case is O(1). I thought of starting from the beginning of the array, and start checking the indexes that are powers of 2. but I got lost.

Genadi
  • 249
  • 2
  • 6
  • 2
    Binary search worst case *is* O(log n)… – taylor swift Nov 25 '16 at 17:38
  • If it was a `log(size(A))` you wanted, binary search would work. If you're trying to optimize in terms of the location of `x`, my gut says check powers of `2` until you find a value larger than `x`, then binary search in the top half, but I'm not completely clear on what you're asking. – Jesus is Lord Nov 25 '16 at 17:40
  • @taylorswift I know, but I need `Θ(logd)` which means the best case is logd too, and its not correct in binary search. – Genadi Nov 25 '16 at 17:43
  • 1
    @Genadi , I’m not sure if you understand what best case and worst case complexity means, *worst case* means that the algorithm can *never* take longer than O(log n) to perform this search, no matter the input. If you need an O(log n) algorithm, than every O(1) algorithm satisfies that since O(1) <~ O(log(n)). – taylor swift Nov 25 '16 at 17:46
  • @taylorswift its Theta(logd). please read carefully. – Genadi Nov 25 '16 at 17:48
  • 1
    According to your notation, you mean to say that the average time complexity should be `logd`. The best case for any algorithm can be `O(1)` by sheer chance/coincidence – User_Targaryen Nov 25 '16 at 17:49
  • @Genadi so if I’m reading this correctly, you want an algorithm that dallies for Θ(log(n)) time no matter what, even if it already found the answer early? I don’t know why you would ever need such a thing, but you could always just use `sleep()` with the duration calculated once you find the index of the element… – taylor swift Nov 25 '16 at 17:53
  • @taylorswift No. I want an algorithm that in best case, would run in `Ω(logd)` and in worst case would run in `O(logd)` that is the definition of `Θ(logd)`. For example if `x` was the second element in the array, I would like to find him after 1 action and that is not the case using binary search. Got it now? – Genadi Nov 25 '16 at 18:05

3 Answers3

1

You can do it like this: It uses Θ(log d) steps to find a range of size Θ(d), and then does a binary search in that range in another Θ(log d) steps.

int search(int[] array, int length, int valueToFind)
{
    int pos=0;
    int limit=min(length,1);
    while(limit < length && array[limit] < valueToFind)
    {
        pos=limit+1;
        limit = min(length, limit*2+1);
    }
    while(pos<limit)
    {
        int testpos = pos+((limit-pos)>>1);

        if (array[testpos]<valueToFind)
            pos=testpos+1;
        else
            limit=testpos;
    }
    return (pos < length && array[pos]==valueToFind ? pos : -1);
}

Note that the binary search I use does not exit early, and always takes Θ(log (limit-pos)) time. Even so it's faster than other searches that do exit early, because it does only one comparison per iteration. I describe other advantages here:

How can I simplify this working Binary Search code in C?

Community
  • 1
  • 1
Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • That's nice! what does the operator `>>` means? – Genadi Nov 25 '16 at 18:51
  • 2
    @Genadi >>1 means shift right by 1 bit. It's the same as /2, except the division always rounds down instead of rounding toward 0, i.e., 5>>1 == 2 and -5 >>1 == -3. The difference is not important in this application -- I could have written /2, but I'm old and I remember a time when there was a performance difference. – Matt Timmermans Nov 26 '16 at 05:03
1

I have a simple python implementation based on the power of 2's approach as discussed in the comments section. Please have a look:

def binary_search(nums,low,high,x):
  while low<=high:
    mid = (low+high)/2
    if nums[mid]==x:
      return mid+1
    elif nums[mid]>x:
      high = mid-1
    else:
      low = mid+1
  return -1

def find_index(nums,x):
  i = 1
  l = len(nums)
  while i<l:
    if nums[i-1]==x:
      return i
    elif 2*i<l and nums[2*i-1]>x:
      return binary_search(nums,i-1,2*i-1,x)
    i = 2*i
  return binary_search(nums,i/2,l-1,x)

def main():
  line = raw_input("Enter numbers: ").split()
  nums = []
  for ele in line:
    nums.append(int(ele))

  nums = sorted(nums)
  print "Sorted array: ",nums
  x = int(raw_input("Enter the element to find in sorted array: "))
  print find_index(nums, x)

main()

Firstly, it tries to find the target element by moving over indexes with a power of 2.

At any point if the current element exceeds the target element, then we do a binary search between the current index(current power of 2) and the previous index(previous power of 2).

The time complexity of the searching process is logd on an average. Also the best case time complexity is logd as expected!!

Hope it is easy to understand!!!!

User_Targaryen
  • 4,125
  • 4
  • 30
  • 51
0

If to compare different methods to find value in 1d sorted array, there is no strong winner for array sizes =<10000, each method has its own 'cons' depending on target position, array size and numpy/list option. However, for large numpy arrays(>1M) it is clear that bisect and searchsort are winners.

There are 5 methods bellow: imported bisect with verification, python loop, numba solution tal's answer at Numpy: find first index of value fast, User_Targaryen's answer above, np.searchsort with verification. All of them accept numpy/lists, except tal's answer, where the to-numpy-conversion is required for lists. Only bisect and find_index methods feels fine resisting to lists len increase.

Bellow is an example for 10K numpy array:

import numpy as np
from numba import jit
from timeit import timeit
import bisect
   
# bisect.bisect
def bisect_(nums,x):
    pos = bisect.bisect_left(nums, x)
    try: 
        if nums[pos] == x:
            return pos
        else:  # e.g., bisect_([1,3,4],2)
            return -1
    except:  # e.g. bisect_([1,3,4],5) 
        return -1


# python loop
def find_first(vec, item):
    for i, v in enumerate(vec):
        if item == v:
            return i
    return -1

# tal's answer at https://stackoverflow.com/questions/7632963/numpy-find-first-index-of-value-fast
@jit(nopython=True)
def find_firstj(vec, item):
    """return the index of the first occurence of item in vec"""
    for i, v in enumerate(vec):
        if item == v:
            return i
    return -1

# User_Targaryen's answer above
def binary_search(nums,low,high,x):
  while low<=high:
    mid = (low+high)//2
    if nums[mid]==x:
      return mid+1
    elif nums[mid]>x:
      high = mid-1
    else:
      low = mid+1
  return -1

def find_index(nums,x):
  i = 1
  l = len(nums)
  while i<l:
    if nums[i-1]==x:
      return i
    elif 2*i<l and nums[2*i-1]>x:
      return binary_search(nums,i-1,2*i-1,x)
    i = 2*i
  return binary_search(nums,i//2,l-1,x)

# numpy searchsort
def searchsort(u, x):
    pos=np.searchsorted(u, x)
    try: 
        if u[pos] == x:
            return pos
        else:  # e.g., bisect_([1,3,4],2)
            return -1
    except:  # e.g. bisect_([1,3,4],5)
        return -1

SIZE = 10000
u = np.arange(SIZE)

# at the beginning
print('---- at the beginning ----')
x = 40

print('bisect.bisect', timeit(lambda: bisect_(u, x), number=100)*10, 's')
print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')

# in the middle
print('---- in the middle ----')
x = 5040

print('bisect.bisect', timeit(lambda: bisect.bisect(u, x), number=100)*10, 's')
print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')

# in the end
print('---- at the end ----')


 x = 9940
    
    print('bisect.bisect', timeit(lambda: bisect.bisect(u, x), number=100)*10, 's')
    print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
    print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
    print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
    print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')

For 10K numpy array the result is:

    ---- at the beginning ----
bisect.bisect 0.009455299004912376 s
find_first 0.01455735880881548 s
find_firstj 0.023144721053540707 s
find_index 0.010857589077204466 s
searchsort 0.005023379344493151 s
---- in the middle ----
bisect.bisect 0.002835090272128582 s
find_first 1.3125479291193187 s
find_firstj 0.003973201382905245 s
find_index 0.01642579911276698 s
searchsort 0.007164010312408209 s
---- at the end ----
bisect.bisect 0.0030333898030221462 s
find_first 3.0728489509783685 s
find_firstj 0.006884939502924681 s
find_index 0.01768168993294239 s
searchsort 0.007541531231254339 s

For 1.5M array (x = 40/750040/1499400) the result is:

---- at the beginning ----
bisect.bisect 0.004126268904656172 s
find_first 0.009320289827883244 s
find_firstj 0.0005324906669557095 s
find_index 0.006837178952991962 s
searchsort 0.004959029611200094 s
---- in the middle ----
bisect.bisect 0.003964949864894152 s
find_first 166.71787671046332 s
find_firstj 0.5532677914015949 s
find_index 0.024038420524448156 s
searchsort 0.004650468472391367 s
---- at the end ----
bisect.bisect 0.0036938488483428955 s
find_first 329.8780040605925 s
find_firstj 1.570841120555997 s
find_index 0.026646379847079515 s
searchsort 0.00493861036375165 s

Please, copypaste to check for other sizes and options.

Yerbol Sapar
  • 31
  • 1
  • 8