4

Similar question has been asked for a sorted list here, but the solution used bisect which is not working for reserve sorted list.

Say I have a list, sorted in reverse order, keyed on the middle element,

my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1.56,0] ....]

and I want to apply a series of threshold values on the middle element, which is in a separate sorted list, say

threshold = [0.97, 0.90, 0.83, 0.6]

I am trying to find out the index of the first element smaller than the threshold value. In the above example it should return,

index_list = [2, 2, 3, 6]

Suggestiong on how can it be done in the fastest way ?

Community
  • 1
  • 1
R.Bahl
  • 399
  • 6
  • 18

5 Answers5

4

According to this great answer from @gnibbler, you can rewrite bisect code yourself to fit your need

I modify the code from @gnibbler slightly, so that it can be used in your case

An optimization is that since your thresholds are also sorted, we don't need to search the whole list each time, but start from the last result index

def reverse_binary_search(a, x, lo=0, hi=None):
    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi: 
        mid = (lo+hi)/2
        if x > a[mid][4]:
            hi = mid 
        else:
            lo = mid+1
    return lo

my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1.56,0]]
threshold = [0.97, 0.90, 0.83, 0.6]

index_list = []
last_index = 0
for t in threshold:
    last_index = reverse_binary_search(my_list, t, last_index) # next time start search from last_index
    index_list.append(last_index)

Thanks @PhilCooper for valuable suggestions. Here is the code using generator as he proposed:

def reverse_binary_search(a, threshold):
    lo = 0
    for t in threshold:
        if lo < 0:
            raise ValueError('lo must be non-negative')
        hi = len(a)
        while lo < hi: 
            mid = (lo+hi)/2
            if t > a[mid][6]:
                hi = mid 
            else:
                lo = mid+1
        yield lo

my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1.56,0]]
threshold = [0.97, 0.90, 0.83, 0.6]

index_list = list(reverse_binary_search(my_list, threshold))
Community
  • 1
  • 1
xvatar
  • 3,229
  • 17
  • 20
  • xvatar: This is working just fine ! exactly what I was looking for ! Guess I should also thank @gnibbler ! – R.Bahl Jul 06 '12 at 21:58
  • 1
    Accepted the longest answer, that requires a separate function def and uses append in a for loop. FAIL! – Phil Cooper Jul 06 '12 at 23:38
  • @xvatar The solutions below using bisect I think qualify as standard libray answers. HOWEVER, I do see that you have optimized for the fact the thresholds is sorted, so yes he said fast and your solution will i think outperform any of the others. NOTE: the `reverse_binary_search` function could be converted to a generator (then there is no need to pass the last_index back in). the "final" line could then take the form of `list(reverse_binary_search(my_list, threshold))`. Like you said, suits OP as-is and it is optimized. – Phil Cooper Jul 07 '12 at 00:26
  • @PhilCooper point well made:) I updated the answer as you suggested – xvatar Jul 07 '12 at 01:00
1

Using numpy, which I think looks a bit cleaner than the pure python implementations and is almost certain to be faster:


import numpy as np
arr = np.array([[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [10,0.50, 24]])
thresholds = [0.97, 0.90, 0.83, 0.60]
idx = [np.min(np.where(arr[:,1] < i)) for i in thresholds if np.where(arr[:,1] < i)[0].size > 0]
print idx
[2, 2, 3, 6]
reptilicus
  • 10,290
  • 6
  • 55
  • 79
  • Numpy does work fine, but what I have observed is that, although it is faster than pure python, but the overhead involved in converting list to numpy array can quickly become a bottleneck, specially if the data is large – R.Bahl Jul 06 '12 at 22:08
  • is there a way to just return the index of the last element i.e say len, if the element is there is no element < threshold ?. Numpy is giving an error as of now – R.Bahl Jul 06 '12 at 22:29
  • You can add an if statement to test if the np.where() statement caught something. I edited the statement above as an example. . . – reptilicus Jul 07 '12 at 16:22
0

Try the following:

threshold = [0.97, 0.90, 0.83, 0.6]
my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1,.56,0]]
threshold = [0.97, 0.90, 0.83, 0.6]

index_list = []
ti = 0
for i, item in enumerate(my_list):
    if item[1] >= threshold[ti]:
        continue
    while ti < len(threshold) and item[1] < threshold[ti]:
        index_list.append(i)
        ti += 1
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
0

I'm thinking your should get the keys and reverse. Then bisecet is OK

from bisect import bisect_left

keys = [vals[1] for vals in my_list]
keys.reverse()
mylen = len(my_list)
[mylen-bisect_left(keys,t) for t in threshold]

If you have numpy already:

my_array = np.array([[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [10,0.50, 24]])
thresholds = [0.97, 0.90, 0.83, 0.60]

my_array.shape[0]-arr[::-1,1].searchsorted(threshold)
Phil Cooper
  • 5,747
  • 1
  • 25
  • 41
0
import bisect
my_list_2  = sorted(my_list, key=lambda x:x[1])
for x in threshold:
    len(my_list) - bisect.bisect([z[1] for z in my_list_2], x)
iruvar
  • 22,736
  • 7
  • 53
  • 82