4

For my project I need to repeatedly find the indices of timestamps in lists and if the exact timestamp is not in the list I need to find the index of the timestamp right before the one I'm looking for. I tried looping through the list, but that's very slow:

def find_item_index(arr, x):
    '''
    returns index of x in ordered list.
    If x is between two items in the list, the index of the lower one is returned.
    '''

    for index in range(len(arr)):
        if arr[index] <= x < arr[index+1]:
            return index

    raise ValueError(f'{x} not in array.')

I also tried to do it recursivly, but that was even slower:

def find_item_index_recursive(arr, x, index = 0):
    '''
    returns index of x in ordered list.
    If x is between two items in the list, the index of the lower one is returned.
    '''

    length = len(arr)

    if length == 1:
        return index

    if arr[length // 2] < x:
        return find_item_index_recursive(arr[length // 2:], x, index + length // 2)
    else:
        return find_item_index_recursive(arr[:length // 2], x, index)

    raise ValueError(f'{x} not in array.')

Is there a faster way to do this?

Tim Berti
  • 67
  • 6
  • 2
    The recursive method is slow because you're making lots of copies by slicing. Pass the start/end indexes along with the original list and it should be fast. – Barmar Apr 23 '21 at 15:10
  • 3
    sort it and use [bisect](https://docs.python.org/3/library/bisect.html)! – ti7 Apr 23 '21 at 15:14
  • Surely sorting will be slower than one loop through the list @ti7 – Pranav Hosangadi Apr 23 '21 at 15:15
  • Does this answer your question? [Fastest way to find Indexes of item in list?](https://stackoverflow.com/questions/35640780/fastest-way-to-find-indexes-of-item-in-list) – youssef jallouli Apr 23 '21 at 15:15
  • @PranavHosangadi definitely, but they need to _repeatedly_ find something and also imply the list is sorted (otherwise why does the index matter?) – ti7 Apr 23 '21 at 15:16
  • To be clear, you are supposed to get an answer even when the value isn't actually in the list? What will you do with this index value? Are the timestamps in order? **What exactly is the overall problem you are trying to solve?** – Karl Knechtel Apr 23 '21 at 18:17

4 Answers4

3

Sort the list and keep track of whether it's sorted before bothering to do any work with it

if not arr_is_sorted:     # create me somewhere!
    arr.sort()            # inplace sort
    arr_is_sorted = True  # unset if you're unsure if the array is sorted

With a sorted list, you can binary search to efficiently O(log n) find the insertion point - there's a convenient builtin library for this, bisect!

import bisect
insertion_point = bisect.bisect_left(arr, x)

This also keeps the array sorted, so you don't need to re-sort it unless you make unrelated changes to it (ideally you would never make an unordered insertion, so it will always be sorted)

Here's a complete example of how to use bisect

>>> l = [100,50,200,99]
>>> l.sort()
>>> l
[50, 99, 100, 200]
>>> import bisect
>>> bisect.bisect_left(l, 55)
1
>>> bisect.bisect_left(l, 201)
4

The you can use arr.insert(position, value) to put the value into the list

>>> l
[50, 99, 100, 200]
>>> value = 55
>>> l.insert(bisect.bisect_left(l, value), value)
>>> l
[50, 55, 99, 100, 200]

You can prevent duplicate insertions by checking if that position is already equal

>>> pos = bisect.bisect_left(l, value)
>>> if pos == len(l) or l[pos] != value:  # length check avoids IndexError
...     l.insert(pos, value)
ti7
  • 16,375
  • 6
  • 40
  • 68
2

this should work fast I think: (I am assuming that your timestamps are sorted?)

def find_item_index(arr, x):
    '''
    returns index of x in ordered list.
    If x is between two items in the list, the index of the lower one is returned.
    '''
    
    l = len(arr)
    i = l//2
    j = i//2
    
    while(j>0):
        if x<arr[i]:
            i-= j
        else:
            i+= j
        j = j//2
    return i

Edit: I just checked. Compared to your first version it is faster for longer lists.. I expect at least 4 times, if list gets longer even 10 times

eli44
  • 111
  • 1
  • 12
1

List has an in-built method which will give you the index of an element. If the element is not found then it'll raise value error.

try:
    index = list1.index(element_to_search)
except ValueError as e:
    print('element not found')
Nk03
  • 14,699
  • 2
  • 8
  • 22
  • 1
    That helps, if the exact item is in the array, but not if the item i'm looking for is between two entries of the array. – Tim Berti Apr 23 '21 at 15:28
1

Numpy searchsorted is usually involved in these cases:

np.searchsorted([1,2,8,9], 5) # Your case
> 2

np.searchsorted([1,2,8,9], (-1, 2, 100))  #Other cases
> array([0, 1, 4])

index in missing cases refers to the near right. If this is not your case, this can be modified in order to obtain the near left position.

Glauco
  • 1,385
  • 2
  • 10
  • 20