1

I wish to locate the index of the closest higher value to a query over a sorted numpy array (where the query value is not in the array).

similar to bisect_right in the python standard library, without converting the numpy array to a python list, and leveraging the fact that the array is sorted (i.e. runtime should be O(log N), like numpy's searchsorted).

Pandas have this option using get_loc with the 'bfill' option, but it seems a bit of an overkill to include it as a dependency just for this... I might have to resort to holding this array as both a python list and a numpy array, but wanted to hear if there's a more reasonable solution.

Edit: It seems searchsorted does exactly what I need.

Mr.WorshipMe
  • 713
  • 4
  • 16
  • I am not sure what's the question. You have a sorted NumPy array and searchsorted doesn't work for you? – Divakar Jun 13 '20 at 09:14
  • @Divakar searchsorted would return 0 or len(a) depending on the side argument for a value that does not exist in the array - and I want to get the index of closest higher value. – Mr.WorshipMe Jun 13 '20 at 09:27
  • Closest and higher? Don't think you can do both. If you are looking for closest, this should be relevant - https://stackoverflow.com/questions/45349561/. Can you show a minimal sample case where searchsorted doesnt work for your case? – Divakar Jun 13 '20 at 09:41
  • @Divakar Thanks, I was going off of what I understood from numpy's documentation, where it seemed that if a value does not exist it returns 0 or len(a), while in actuality, searchsorted does exactly what I needed. – Mr.WorshipMe Jun 13 '20 at 11:45

1 Answers1

3

We can see the code for bisect_right on github:

def bisect_right(a, x, lo=0, hi=None):
    """Return the index where to insert item x in list a, assuming a is sorted.
    The return value i is such that all e in a[:i] have e <= x, and all e in
    a[i:] have e > x.  So if x already appears in the list, a.insert(x) will
    insert just after the rightmost x already there.
    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.
    """

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        # Use __lt__ to match the logic in list.sort() and in heapq
        if x < a[mid]: hi = mid
        else: lo = mid+1
    return lo

This is all numpy compliant:

import numpy as np

array = np.array([1,2,3,4,5,6])
print(bisect_right(array, 7))
>>> 6
print(bisect_right(array, 0))
>>> 0

To find the index of the closest higher value to a number given:

def closest_higher_value(array, value):
    if bisect_right(array, value) < len(array):
        return bisect_right(array, value)
    print("value too large:", value, "is bigger than all elements of:")
    print(array)


print(closest_higher_value(array, 3))
>>> 3
print(closest_higher_value(array, 7))
>>> value too large: 7 is bigger than all elements of:
>>> [1 2 3 4 5 6]
>>> None
Krish
  • 1,044
  • 9
  • 20
  • Thanks, this works, but as Divakar noted in a comment to my question, numpy's searchsorted does exactly this, so, no need for bisect_right after all. If you'd include searchsorted in your answer, I'd mark it as chosen. – Mr.WorshipMe Jun 13 '20 at 11:49
  • @Mr.WorshipMe: I think Krish deserves to have their answer marked as the chosen because it answers your exact question: "similar to bisect_right in the python standard library, without converting the numpy array to a python list, and leveraging the fact that the array is sorted" – hekimgil Dec 07 '21 at 04:03
  • @hekimgil Yes, I would choose it if he mentions searchsorted, which is the correct function to use in this case (as it is part of numpy's interface, and not an implementation in python for it). – Mr.WorshipMe Dec 27 '21 at 20:35