Python: index of fist element smaller than threshold in reverse sorted list

Question

Similar question has been asked for a sorted list here, but the solution used bisect which is not working for reserve sorted list.

Say I have a list, sorted in reverse order, keyed on the middle element,

my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1.56,0] ....]

and I want to apply a series of threshold values on the middle element, which is in a separate sorted list, say

threshold = [0.97, 0.90, 0.83, 0.6]

I am trying to find out the index of the first element smaller than the threshold value. In the above example it should return,

index_list = [2, 2, 3, 6]

Suggestiong on how can it be done in the fastest way ?

score 4 · Accepted Answer · edited May 23 '17 at 12:20

According to this great answer from @gnibbler, you can rewrite bisect code yourself to fit your need

I modify the code from @gnibbler slightly, so that it can be used in your case

An optimization is that since your thresholds are also sorted, we don't need to search the whole list each time, but start from the last result index

def reverse_binary_search(a, x, lo=0, hi=None):
    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi: 
        mid = (lo+hi)/2
        if x > a[mid][4]:
            hi = mid 
        else:
            lo = mid+1
    return lo

my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1.56,0]]
threshold = [0.97, 0.90, 0.83, 0.6]

index_list = []
last_index = 0
for t in threshold:
    last_index = reverse_binary_search(my_list, t, last_index) # next time start search from last_index
    index_list.append(last_index)

Thanks @PhilCooper for valuable suggestions. Here is the code using generator as he proposed:

def reverse_binary_search(a, threshold):
    lo = 0
    for t in threshold:
        if lo < 0:
            raise ValueError('lo must be non-negative')
        hi = len(a)
        while lo < hi: 
            mid = (lo+hi)/2
            if t > a[mid][6]:
                hi = mid 
            else:
                lo = mid+1
        yield lo

my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1.56,0]]
threshold = [0.97, 0.90, 0.83, 0.6]

index_list = list(reverse_binary_search(my_list, threshold))

xvatar: This is working just fine ! exactly what I was looking for ! Guess I should also thank @gnibbler ! — R.Bahl, Jul 06 '12 at 21:58
Accepted the longest answer, that requires a separate function def and uses append in a for loop. FAIL! — Phil Cooper, Jul 06 '12 at 23:38
@xvatar The solutions below using bisect I think qualify as standard libray answers. HOWEVER, I do see that you have optimized for the fact the thresholds is sorted, so yes he said fast and your solution will i think outperform any of the others. NOTE: the `reverse_binary_search` function could be converted to a generator (then there is no need to pass the last_index back in). the "final" line could then take the form of `list(reverse_binary_search(my_list, threshold))`. Like you said, suits OP as-is and it is optimized. — Phil Cooper, Jul 07 '12 at 00:26
@PhilCooper point well made:) I updated the answer as you suggested — xvatar, Jul 07 '12 at 01:00

reptilicus · Answer 2 · 2012-07-07T16:20:54.127

1

Using numpy, which I think looks a bit cleaner than the pure python implementations and is almost certain to be faster:


import numpy as np
arr = np.array([[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [10,0.50, 24]])
thresholds = [0.97, 0.90, 0.83, 0.60]
idx = [np.min(np.where(arr[:,1] < i)) for i in thresholds if np.where(arr[:,1] < i)[0].size > 0]
print idx
[2, 2, 3, 6]

edited Jul 07 '12 at 16:20

answered Jul 06 '12 at 22:01

reptilicus

10,290
6
55
79

Numpy does work fine, but what I have observed is that, although it is faster than pure python, but the overhead involved in converting list to numpy array can quickly become a bottleneck, specially if the data is large – R.Bahl Jul 06 '12 at 22:08
is there a way to just return the index of the last element i.e say len, if the element is there is no element < threshold ?. Numpy is giving an error as of now – R.Bahl Jul 06 '12 at 22:29
You can add an if statement to test if the np.where() statement caught something. I edited the statement above as an example. . . – reptilicus Jul 07 '12 at 16:22

score 0 · Answer 3 · answered Jul 06 '12 at 21:23

0

Try the following:

threshold = [0.97, 0.90, 0.83, 0.6]
my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1,.56,0]]
threshold = [0.97, 0.90, 0.83, 0.6]

index_list = []
ti = 0
for i, item in enumerate(my_list):
    if item[1] >= threshold[ti]:
        continue
    while ti < len(threshold) and item[1] < threshold[ti]:
        index_list.append(i)
        ti += 1

answered Jul 06 '12 at 21:23

Andrew Clark

202,379
35
273
306

One could also use `itertools.dropwhile` and `itertools.cycle`. – Joel Cornett Jul 06 '12 at 22:15

Phil Cooper · Answer 4 · 2012-07-06T23:53:42.590

I'm thinking your should get the keys and reverse. Then bisecet is OK

from bisect import bisect_left

keys = [vals[1] for vals in my_list]
keys.reverse()
mylen = len(my_list)
[mylen-bisect_left(keys,t) for t in threshold]

If you have numpy already:

my_array = np.array([[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [10,0.50, 24]])
thresholds = [0.97, 0.90, 0.83, 0.60]

my_array.shape[0]-arr[::-1,1].searchsorted(threshold)

score 0 · Answer 5 · answered Jul 06 '12 at 21:44

0

import bisect
my_list_2  = sorted(my_list, key=lambda x:x[1])
for x in threshold:
    len(my_list) - bisect.bisect([z[1] for z in my_list_2], x)

answered Jul 06 '12 at 21:44

iruvar

22,736
7
53
82

Python: index of fist element smaller than threshold in reverse sorted list

5 Answers5