479

How do I find the nearest value in a numpy array? Example:

np.find_nearest(array, value)
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Fookatchu
  • 7,125
  • 5
  • 28
  • 26

20 Answers20

720
import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

Example usage:

array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]

print(find_nearest(array, value=0.5))
# 0.568743859261
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 60
    @EOL: `return np.abs(array-value).min()` gives the wrong answer. This gives you the min of the absolute value distance, and somehow we need to return the actual array value. We could add `value` and come close, but the absolute value throws a wrench into things... – unutbu Apr 02 '10 at 18:51
  • 9
    @~unutbu You're right, my bad. I can't think of anything better than your solution! – Eric O. Lebigot Apr 03 '10 at 23:07
  • 42
    seems crazy there isn't a numpy built-in that does this. – abcd Apr 08 '15 at 19:32
  • 2
    Can someone explain what is the time complexity of this algorithm? – jsmedmar Aug 01 '16 at 17:26
  • 2
    @jsmedmar `O(n)` unless is doing something strange: `n` operations to apply the vectorized subtraction, `n` operations to find the minimum, up to `n` operations to get the element based on the index. `O(3n) = O(n)` – Οurous Oct 08 '16 at 03:52
  • 3
    @jsmedmar The bisection method (see my below answer) is O(log(n)). – Josh Albert Jan 25 '17 at 16:28
  • 5
    `FutureWarning: 'argmin' is deprecated. Use 'idxmin' instead. The behavior of 'argmin' will be corrected to return the positional minimum in the future. Use 'series.values.argmin' to get the position of the minimum now.` Using `idxmin` instead of `argmin` works for me with the solution above. (v3.6.4) – gosuto May 15 '18 at 06:36
  • 2
    @jorijnsmit: To make the code compatible with Pandas Series, I've added `array = np.asarray(array)`. This will convert the Series into a NumPy array. Then calling NumPy's [`argmin`](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.argmin.html) will work as expected. I can not change it to `idxmin` because that would break the code if `find_nearest` were passed a NumPy array. – unutbu May 15 '18 at 11:01
  • It is worth pointing out that although this works on unsorted data, if there there are multiple array elements with equal absolute distance to the value, the 1st value found will be returned. This is because of use of [``numpy.argmin``](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.argmin.html) which follows the rule: ``In case of multiple occurrences of the minimum values, the indices corresponding to the first occurrence are returned.`` – tsherwen Jun 13 '18 at 17:42
  • amazing answer, I tried this with a `groupby().transform(lambda)` but I kept getting a shape mismatch error, the vectorised version below seemed to work. if you don't mind could you explain why? – Umar.H Apr 16 '19 at 20:48
  • This is really helpful, but is there a way to reduce the tolerance of the function. For example, if I want to search for 10, but this value isnt represented, the function will return say... 18? – user9454050 Oct 18 '19 at 00:56
  • 1
    Hello @unutbu. Am I allowed to use this code for commercial use? – Berkcem Nov 29 '19 at 13:50
  • 22
    big fat warning: if your data contains np.nan, those points will always come out as nearest. – johanvdw Nov 10 '20 at 09:31
  • 15
    @johanvdw wow that should almost count as a bug. To fix it, replace `np.argmin()` with `np.nanargmin()` and it works. – eric Mar 18 '21 at 15:10
  • [Numpy searchsorted](https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html) works for any number of query points, 1 or 1000. One *can* do queries one by one, `for q in querypoints: n = find_nearest( q )`, but sorting all 1000, then essentially mergesort, is way faster. (I think `searchsorted` does that, don't know.) See `class Mid_rgrid` [below](https://stackoverflow.com/questions/2566412/find-nearest-value-in-numpy-array/69479668#69479668) which finds nearest-neighbor of any number of queries with searchsorted. It works on regular grids in 2d, 3d ... too. – denis Oct 17 '21 at 16:07
  • I've always kept this function in my pocket. Except that I usually return `idx, array[idx]` so I can use the indices elsewhere. – jtlz2 Apr 07 '22 at 08:14
119

IF your array is sorted and is very large, this is a much faster solution:

def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

This scales to very large arrays. You can easily modify the above to sort in the method if you can't assume that the array is already sorted. It’s overkill for small arrays, but once they get large this is much faster.

DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119
Demitri
  • 13,134
  • 4
  • 40
  • 41
  • 1
    That sounds like the most reasonable solution. I wonder why it is so slow anyways. Plain `np.searchsorted` takes about 2 µs for my test set, the whole function about 10 µs. Using `np.abs` it's getting even worse. No clue what python is doing there. – Michael Feb 17 '15 at 18:07
  • 2
    @Michael For single values, the Numpy math routines will be slower than the `math` routines, see [this answer](http://stackoverflow.com/questions/3650194/are-numpys-math-functions-faster-than-pythons). – Demitri Feb 18 '15 at 14:53
  • 5
    This is the best solution if you have multiple values you want to look up at once (with a few adjustments). The whole `if/else` needs to be replaced with `idx = idx - (np.abs(value - array[idx-1]) < np.abs(value - array[idx])); return array[idx]` – coderforlife Jan 08 '16 at 07:58
  • 4
    This is great but doesn't work if `value` is bigger than `array`'s biggest element. I changed the `if` statement to `if idx == len(array) or math.fabs(value - array[idx - 1]) < math.fabs(value - array[idx])` to make it work for me! – nicoco May 03 '16 at 13:06
  • 3
    This doesn't work when idx is 0. The if should read: `if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):` – JPaget May 11 '16 at 04:51
80

With slight modification, the answer above works with arrays of arbitrary dimension (1d, 2d, 3d, ...):

def find_nearest(a, a0):
    "Element in nd array `a` closest to the scalar value `a0`"
    idx = np.abs(a - a0).argmin()
    return a.flat[idx]

Or, written as a single line:

a.flat[np.abs(a - a0).argmin()]
kwgoodman
  • 2,088
  • 16
  • 7
27

Summary of answer: If one has a sorted array then the bisection code (given below) performs the fastest. ~100-1000 times faster for large arrays, and ~2-100 times faster for small arrays. It does not require numpy either. If you have an unsorted array then if array is large, one should consider first using an O(n logn) sort and then bisection, and if array is small then method 2 seems the fastest.

First you should clarify what you mean by nearest value. Often one wants the interval in an abscissa, e.g. array=[0,0.7,2.1], value=1.95, answer would be idx=1. This is the case that I suspect you need (otherwise the following can be modified very easily with a followup conditional statement once you find the interval). I will note that the optimal way to perform this is with bisection (which I will provide first - note it does not require numpy at all and is faster than using numpy functions because they perform redundant operations). Then I will provide a timing comparison against the others presented here by other users.

Bisection:

def bisection(array,value):
    '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
    and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
    to indicate that ``value`` is out of range below and above respectively.'''
    n = len(array)
    if (value < array[0]):
        return -1
    elif (value > array[n-1]):
        return n
    jl = 0# Initialize lower
    ju = n-1# and upper limits.
    while (ju-jl > 1):# If we are not yet done,
        jm=(ju+jl) >> 1# compute a midpoint with a bitshift
        if (value >= array[jm]):
            jl=jm# and replace either the lower limit
        else:
            ju=jm# or the upper limit, as appropriate.
        # Repeat until the test condition is satisfied.
    if (value == array[0]):# edge cases at bottom
        return 0
    elif (value == array[n-1]):# and top
        return n-1
    else:
        return jl

Now I'll define the code from the other answers, they each return an index:

import math
import numpy as np

def find_nearest1(array,value):
    idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
    return idx

def find_nearest2(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return indices

def find_nearest3(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
    out = array[indices]
    return indices

def find_nearest4(array,value):
    idx = (np.abs(array-value)).argmin()
    return idx


def find_nearest5(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

def find_nearest6(array,value):
    xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
    return xi

Now I'll time the codes: Note methods 1,2,4,5 don't correctly give the interval. Methods 1,2,4 round to nearest point in array (e.g. >=1.5 -> 2), and method 5 always rounds up (e.g. 1.45 -> 2). Only methods 3, and 6, and of course bisection give the interval properly.

array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)

(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop

For a large array bisection gives 4us compared to next best 180us and longest 1.21ms (~100 - 1000 times faster). For smaller arrays it's ~2-100 times faster.

Josh Albert
  • 1,064
  • 13
  • 16
  • 3
    You're assuming that the array is sorted. There are many reasons why someone would not want to sort the array: for example, if the array represented the data points on a line graph. – adamcircle Jun 08 '17 at 17:29
  • 12
    The python standard library already contains in implementation of the bisection algorithm: https://docs.python.org/3.6/library/bisect.html – Felix Jul 24 '17 at 15:11
  • When you said, "if `array` is small then method 2 seems the fastest." how small did you mean @JoshAlbert? – Mr.Zeus Aug 15 '18 at 19:01
  • 2
    This doesn't find the *nearest* value, it finds the next-lowest value. – endolith Sep 18 '18 at 20:00
  • @endolith that's the case for bisect only. – Homero Esmeraldo Feb 26 '19 at 20:11
23

Here is a fast vectorized version of @Dimitri's solution if you have many values to search for (values can be multi-dimensional array):

# `values` should be sorted
def get_closest(array, values):
    # make sure array is a numpy array
    array = np.array(array)

    # get insert positions
    idxs = np.searchsorted(array, values, side="left")
    
    # find indexes where previous index is closer
    prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
    idxs[prev_idx_is_less] -= 1
    
    return array[idxs]

Benchmarks

> 100 times faster than using a for loop with @Demitri's solution`

>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds
David Parks
  • 30,789
  • 47
  • 185
  • 328
anthonybell
  • 5,790
  • 7
  • 42
  • 60
  • in case you have constant sampling in the array, it becomes even simpler: `idx = np.searchsorted(array, values)` then: `idx[array[idx] - values>np.diff(array).mean()*0.5]-=1` and finally `return array[idx]` – Sergey Antopolskiy Feb 26 '18 at 11:53
  • 4
    First answer that "just works": `get_closest([1,5,10,20], [1,4,16]) -> [1, 5, 20]`, this one should have more upvotes. – David Parks Sep 22 '21 at 19:59
  • Exactly what I needed and incredibly fast! Thank you so much Anthony! – egor.ananyev Nov 17 '21 at 00:01
  • any simple way to distinguish if it should be nearest from the left (like lower than) or from the right (like higher than) given value? – Flash Thunder Sep 08 '22 at 07:39
  • Note that the input variable `array` needs to be sorted in ascending order per `np.searchsorted` requirement. – nwly Mar 05 '23 at 02:05
  • Even works when the array has shifted indices caused by passing in `array[3:]`. Perfect! – YPOC May 15 '23 at 13:45
19

Here's an extension to find the nearest vector in an array of vectors.

import numpy as np

def find_nearest_vector(array, value):
  idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
  return array[idx]

A = np.random.random((10,2))*100
""" A = array([[ 34.19762933,  43.14534123],
   [ 48.79558706,  47.79243283],
   [ 38.42774411,  84.87155478],
   [ 63.64371943,  50.7722317 ],
   [ 73.56362857,  27.87895698],
   [ 96.67790593,  77.76150486],
   [ 68.86202147,  21.38735169],
   [  5.21796467,  59.17051276],
   [ 82.92389467,  99.90387851],
   [  6.76626539,  30.50661753]])"""
pt = [6, 30]  
print find_nearest_vector(A,pt)
# array([  6.76626539,  30.50661753])
Onasafari
  • 458
  • 3
  • 7
  • I think `norm(..., axis=-1)` should be faster than extracting the `x,y` values through Python iteration. Also, `x,y` are scalars here? Then `norm(x+y)` is a bug since, e.g., the distance `(+1, -1)` will be treated as 0. – cfh Jul 20 '17 at 12:00
  • This worked for me `idx = np.array([np.linalg.norm(x+y) for (x,y) in abs(array-value)]).argmin()` – ezChx Apr 24 '20 at 03:44
11

If you don't want to use numpy this will do it:

def find_nearest(array, value):
    n = [abs(i-value) for i in array]
    idx = n.index(min(n))
    return array[idx]
Nick Crawford
  • 5,086
  • 2
  • 21
  • 20
10

Here's a version that will handle a non-scalar "values" array:

import numpy as np

def find_nearest(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return array[indices]

Or a version that returns a numeric type (e.g. int, float) if the input is scalar:

def find_nearest(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    out = array[indices]
    return out if len(out) > 1 else out[0]
mevatron
  • 13,911
  • 4
  • 55
  • 72
ryggyr
  • 3,341
  • 2
  • 20
  • 14
  • Good answer, I've never used the `outer` method of a ufunc before, I think I'll be using it more in the future. The first function should return `array[indices]`, by the way. – Widjet Nov 20 '15 at 07:38
  • 2
    This solution does not scale. `np.subtract.outer` will generate the entire outer-product matrix which is really slow and memory intensive if `array` and/or `values` is very large. – anthonybell Sep 12 '17 at 19:22
9

Here is a version with scipy for @Ari Onasafari, answer "to find the nearest vector in an array of vectors"

In [1]: from scipy import spatial

In [2]: import numpy as np

In [3]: A = np.random.random((10,2))*100

In [4]: A
Out[4]:
array([[ 68.83402637,  38.07632221],
       [ 76.84704074,  24.9395109 ],
       [ 16.26715795,  98.52763827],
       [ 70.99411985,  67.31740151],
       [ 71.72452181,  24.13516764],
       [ 17.22707611,  20.65425362],
       [ 43.85122458,  21.50624882],
       [ 76.71987125,  44.95031274],
       [ 63.77341073,  78.87417774],
       [  8.45828909,  30.18426696]])

In [5]: pt = [6, 30]  # <-- the point to find

In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point 
Out[6]: array([  8.45828909,  30.18426696])

#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)

In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393

In [9]: index # <-- The locations of the neighbors
Out[9]: 9

#then 
In [10]: A[index]
Out[10]: array([  8.45828909,  30.18426696])
efirvida
  • 4,592
  • 3
  • 42
  • 68
  • Building a KDTree is quite an overhead for such a problem. I would not recommend such a solution unless you have to make multiple queries on a big array ... And then, it would be better to build it once and reuse it, rather than creating it on the fly for each query. – Ben Apr 12 '17 at 13:19
8

For large arrays, the (excellent) answer given by @Demitri is far faster than the answer currently marked as best. I've adapted his exact algorithm in the following two ways:

  1. The function below works whether or not the input array is sorted.

  2. The function below returns the index of the input array corresponding to the closest value, which is somewhat more general.

Note that the function below also handles a specific edge case that would lead to a bug in the original function written by @Demitri. Otherwise, my algorithm is identical to his.

def find_idx_nearest_val(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest
JPaget
  • 969
  • 10
  • 13
aph
  • 1,765
  • 2
  • 19
  • 34
  • 1
    It's worth pointing out that this is a great example of how optimizing code tends to make it uglier and harder to read. The answer given by @unutbu should be (much) preferred in cases where speed is not a major concern, since it is far more transparent. – aph Apr 08 '15 at 15:01
  • I don't see the answer given by @Michael. Is this an error or am I blind? – Fookatchu Apr 09 '15 at 09:55
  • Nope, you're not blind, I'm just illiterate ;-) It was @Demitri whose answer I was riffing on. My bad. I just fixed my post. Thanks! – aph Apr 09 '15 at 13:57
  • I get different answers with Demitri's and yours. Any ideas? ``x = np.array([2038, 1758, 1721, 1637, 2097, 2047, 2205, 1787, 2287, 1940, 2311, 2054, 2406, 1471, 1460])``. With ``find_nearest(x, 1739.5)`` (closest value to the first quantile), I get ``1637`` (reasonable) and ``1`` (bug?). – PatrickT Feb 03 '18 at 12:16
  • Agree with PatrickT, this version's buggy. Recommend @anthonybell's solution, which is faster than Demitri's. – nwly Mar 05 '23 at 00:18
6

I think the most pythonic way would be:

 num = 65 # Input number
 array = np.random.random((10))*100 # Given array 
 nearest_idx = np.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num)
 nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)

This is the basic code. You can use it as a function if you want

KansaiRobot
  • 7,564
  • 11
  • 71
  • 150
Ishan Tomar
  • 1,488
  • 1
  • 16
  • 20
6

All the answers are beneficial to gather the information to write efficient code. However, I have written a small Python script to optimize for various cases. It will be the best case if the provided array is sorted. If one searches the index of the nearest point of a specified value, then bisect module is the most time efficient. When one search the indices correspond to an array, the numpy searchsorted is most efficient.

import numpy as np
import bisect
xarr = np.random.rand(int(1e7))

srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)

In [63]: %time bisect.bisect_left(xlist, 0.3) CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 22.2 µs

np.searchsorted(xar, 0.3, side="left")

In [64]: %time np.searchsorted(xar, 0.3, side="left") CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 98.9 µs

randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")

%time np.searchsorted(xar, randpts, side="left") CPU times: user 4 ms, sys: 0 ns, total: 4 ms Wall time: 1.2 ms

If we follow the multiplicative rule, then numpy should take ~100 ms which implies ~83X faster.

Soumen
  • 83
  • 2
  • 6
5

This is a vectorized version of unutbu's answer:

def find_nearest(array, values):
    array = np.asarray(array)

    # the last dim must be 1 to broadcast in (array - values) below.
    values = np.expand_dims(values, axis=-1) 

    indices = np.abs(array - values).argmin(axis=-1)

    return array[indices]


image = plt.imread('example_3_band_image.jpg')

print(image.shape) # should be (nrows, ncols, 3)

quantiles = np.linspace(0, 255, num=2 ** 2, dtype=np.uint8)

quantiled_image = find_nearest(quantiles, image)

print(quantiled_image.shape) # should be (nrows, ncols, 3)
Zhanwen Chen
  • 1,295
  • 17
  • 21
5

Maybe helpful for ndarrays:

def find_nearest(X, value):
    return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]
Gusev Slava
  • 2,136
  • 3
  • 21
  • 26
1

For 2d array, to determine the i, j position of nearest element:

import numpy as np
def find_nearest(a, a0):
    idx = (np.abs(a - a0)).argmin()
    w = a.shape[1]
    i = idx // w
    j = idx - i * w
    return a[i,j], i, j
1

Here is a version that works with 2D arrays, using scipy's cdist function if the user has it, and a simpler distance calculation if they don't.

By default, the output is the index that is closest to the value you input, but you can change that with the output keyword to be one of 'index', 'value', or 'both', where 'value' outputs array[index] and 'both' outputs index, array[index].

For very large arrays, you may need to use kind='euclidean', as the default scipy cdist function may run out of memory.

This is maybe not the absolute fastest solution, but it is quite close.

def find_nearest_2d(array, value, kind='cdist', output='index'):
    # 'array' must be a 2D array
    # 'value' must be a 1D array with 2 elements
    # 'kind' defines what method to use to calculate the distances. Can choose one
    #    of 'cdist' (default) or 'euclidean'. Choose 'euclidean' for very large
    #    arrays. Otherwise, cdist is much faster.
    # 'output' defines what the output should be. Can be 'index' (default) to return
    #    the index of the array that is closest to the value, 'value' to return the
    #    value that is closest, or 'both' to return index,value
    import numpy as np
    if kind == 'cdist':
        try: from scipy.spatial.distance import cdist
        except ImportError:
            print("Warning (find_nearest_2d): Could not import cdist. Reverting to simpler distance calculation")
            kind = 'euclidean'
    index = np.where(array == value)[0] # Make sure the value isn't in the array
    if index.size == 0:
        if kind == 'cdist': index = np.argmin(cdist([value],array)[0])
        elif kind == 'euclidean': index = np.argmin(np.sum((np.array(array)-np.array(value))**2.,axis=1))
        else: raise ValueError("Keyword 'kind' must be one of 'cdist' or 'euclidean'")
    if output == 'index': return index
    elif output == 'value': return array[index]
    elif output == 'both': return index,array[index]
    else: raise ValueError("Keyword 'output' must be one of 'index', 'value', or 'both'")
boof
  • 339
  • 1
  • 3
  • 13
1

For those searching for multiple nearest, modifying the accepted answer:

import numpy as np
def find_nearest(array, value, k):
    array = np.asarray(array)
    idx = np.argsort(abs(array - value))[:k]
    return array[idx]

See: https://stackoverflow.com/a/66937734/11671779

Muhammad Yasirroni
  • 1,512
  • 12
  • 22
0
import numpy as np
def find_nearest(array, value):
    array = np.array(array)
    z=np.abs(array-value)
    y= np.where(z == z.min())
    m=np.array(y)
    x=m[0,0]
    y=m[1,0]
    near_value=array[x,y]

    return near_value

array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))
  • 2
    Hi, welcome to Stack Overflow. [Check out how to write a good answer](https://stackoverflow.com/help/how-to-answer). Try giving a short description of what you did in the context of the question! – Tristo Sep 08 '18 at 10:14
0

This one handles any number of queries, using numpy searchsorted, so after sorting the input arrays, is just as fast. It works on regular grids in 2d, 3d ... too: enter image description here

#!/usr/bin/env python3
# keywords: nearest-neighbor regular-grid python numpy searchsorted Voronoi

import numpy as np

#...............................................................................
class Near_rgrid( object ):
    """ nearest neighbors on a Manhattan aka regular grid
    1d:
    near = Near_rgrid( x: sorted 1d array )
    nearix = near.query( q: 1d ) -> indices of the points x_i nearest each q_i
        x[nearix[0]] is the nearest to q[0]
        x[nearix[1]] is the nearest to q[1] ...
        nearpoints = x[nearix] is near q
    If A is an array of e.g. colors at x[0] x[1] ...,
    A[nearix] are the values near q[0] q[1] ...
    Query points < x[0] snap to x[0], similarly > x[-1].

    2d: on a Manhattan aka regular grid,
        streets running east-west at y_i, avenues north-south at x_j,
    near = Near_rgrid( y, x: sorted 1d arrays, e.g. latitide longitude )
    I, J = near.query( q: nq × 2 array, columns qy qx )
    -> nq × 2 indices of the gridpoints y_i x_j nearest each query point
        gridpoints = np.column_stack(( y[I], x[J] ))  # e.g. street corners
        diff = gridpoints - querypoints
        distances = norm( diff, axis=1, ord= )
    Values at an array A definded at the gridpoints y_i x_j nearest q: A[I,J]

    3d: Near_rgrid( z, y, x: 1d axis arrays ) .query( q: nq × 3 array )

    See Howitworks below, and the plot Voronoi-random-regular-grid.
    """

    def __init__( self, *axes: "1d arrays" ):
        axarrays = []
        for ax in axes:
            axarray = np.asarray( ax ).squeeze()
            assert axarray.ndim == 1, "each axis should be 1d, not %s " % (
                    str( axarray.shape ))
            axarrays += [axarray]
        self.midpoints = [_midpoints( ax ) for ax in axarrays]
        self.axes = axarrays
        self.ndim = len(axes)

    def query( self, queries: "nq × dim points" ) -> "nq × dim indices":
        """ -> the indices of the nearest points in the grid """
        queries = np.asarray( queries ).squeeze()  # or list x y z ?
        if self.ndim == 1:
            assert queries.ndim <= 1, queries.shape
            return np.searchsorted( self.midpoints[0], queries )  # scalar, 0d ?
        queries = np.atleast_2d( queries )
        assert queries.shape[1] == self.ndim, [
                queries.shape, self.ndim]
        return [np.searchsorted( mid, q )  # parallel: k axes, k processors
                for mid, q in zip( self.midpoints, queries.T )]

    def snaptogrid( self, queries: "nq × dim points" ):
        """ -> the nearest points in the grid, 2d [[y_j x_i] ...] """
        ix = self.query( queries )
        if self.ndim == 1:
            return self.axes[0][ix]
        else:
            axix = [ax[j] for ax, j in zip( self.axes, ix )]
            return np.array( axix )


def _midpoints( points: "array-like 1d, *must be sorted*" ) -> "1d":
    points = np.asarray( points ).squeeze()
    assert points.ndim == 1, points.shape
    diffs = np.diff( points )
    assert np.nanmin( diffs ) > 0, "the input array must be sorted, not %s " % (
            points.round( 2 ))
    return (points[:-1] + points[1:]) / 2  # floats

#...............................................................................
Howitworks = \
"""
How Near_rgrid works in 1d:
Consider the midpoints halfway between fenceposts | | |
The interval [left midpoint .. | .. right midpoint] is what's nearest each post --

    |   |       |                     |   points
    | . |   .   |          .          |   midpoints
      ^^^^^^               .            nearest points[1]
            ^^^^^^^^^^^^^^^             nearest points[2]  etc.

2d:
    I, J = Near_rgrid( y, x ).query( q )
    I = nearest in `x`
    J = nearest in `y` independently / in parallel.
    The points nearest [yi xj] in a regular grid (its Voronoi cell)
    form a rectangle [left mid x .. right mid x] × [left mid y .. right mid y]
    (in any norm ?)
    See the plot Voronoi-random-regular-grid.

Notes
-----
If a query point is exactly halfway between two data points,
e.g. on a grid of ints, the lines (x + 1/2) U (y + 1/2),
which "nearest" you get is implementation-dependent, unpredictable.

"""

Murky = \
""" NaNs in points, in queries ?
"""

__version__ = "2021-10-25 oct  denis-bz-py"
denis
  • 21,378
  • 10
  • 65
  • 88
0

I have here a version for sorted inputs, that for a some values in A finds the indices of closest elements in B:

from cmath import inf

import numba
import numpy as np


@numba.njit
def get_indices_of_closest_questioned_points(
    interogators: npt.NDArray,
    questioned: npt.NDArray,
) -> npt.NDArray:
    """For each element in `interogators` get the index of the closest element in set `questioned`.
    """
    res = np.empty(shape=interogators.shape, dtype=np.uint32)
    N = len(interogators)
    M = len(questioned)
    n = m = 0
    closest_left_to_x = -inf
    while n < N and m < M:
        x = interogators[n]
        y = questioned[m]
        if y < x:
            closest_left_to_x = y
            m += 1
        else:
            res[n] = m - (x - closest_left_to_x < y - x)
            n += 1
    while n < N:
        res[n] = M - 1
        n += 1
    return res

sorting is a heavily optimized operation, that runs in O(nlogn) or O(n) depending on the input and used algorithm. Above code is obviously also O(n), numba makes it run faster to numpy speeds.

Below an examplary usage:

In [12]: get_indices_of_closest_questioned_points(np.array([0,5,10]), np.array([-1,2,6,8,9,10]))
Out[12]: array([0, 2, 5], dtype=uint32)

The result is 0 2 5 because -1 is closest to 0 and it's the 0th element of the second array, 5 is closest to 6 that is the 2th element in the second array, and so on.

In case of an input such as [0] and [-1,1], the first of closest elements, -1, will be returned.

Best wishes,

MatteoLacki
  • 423
  • 4
  • 4