Speed up minimum search in Numpy/Python

Question

I have two floating arrays and want to find data points which match within a certain range. This is what I got so far:

import numpy as np

for vx in range(len(arr1)):
    match = (np.abs(arr2-arr1[vx])).argmin()
    if abs(arr1[vx]-arr2[match])<0.375:
        point = arr2[match]

The problem is that arr1 contains 150000 elements and arr2 around 110000 elements. This takes an awful amount of time. Do you have suggestions to speed things up?

Have you tried [profiling it](http://stackoverflow.com/questions/3927628/how-can-i-profile-python-code-line-by-line)? — huon, Nov 07 '12 at 15:21

score 3 · Accepted Answer · answered Nov 07 '12 at 15:46

3

In addition to not being vectorized, your current search is (n * m) where n is the size of arr2 and m is the size of arr1. In these kinds of searches it helps to sort arr1 or arr2 so you can use a binary search. Sorting ends up being the slowest step but it's still faster if m is large because the n*log(n) sort is faster than (n*m).

Here is how you can do the search in a vectorized way using the sorted array:

def find_closest(A, target):
    #A must be sorted
    idx = A.searchsorted(target)
    idx = np.clip(idx, 1, len(A)-1)
    left = A[idx-1]
    right = A[idx]
    idx -= target - left < right - target
    return A[idx]

arr2.sort()
closest = find_closest(arr2, arr1)
closest = np.where(abs(closest - arr1) < .375, closest, np.nan)

answered Nov 07 '12 at 15:46

Bi Rico

25,283
3
52
75

Why is searchsorted not enough? I am not quite understanding what happens within the find_closest() function. However it works fine! – HyperCube Nov 12 '12 at 09:34
1

I wrote a more complete explanation of how `find_closest` works in an answer tho this question, http://stackoverflow.com/questions/8914491/finding-the-nearest-value-and-return-the-index-of-array-in-python/8929827#8929827 , but the reason that searchsorted is not enough is because it returns an such that `target <= A[index]`. That is if you do `searchsorted([0, 1], [0, .2, .9, 1.]) == [0, 1, 1, 1])`. But `.2` is closer to 0 than to 1 so we want `[0, 0, 1, 1]`. – Bi Rico Nov 12 '12 at 16:51

score 0 · Answer 2 · answered Nov 07 '12 at 15:32

0

The whole idea of using numpy is to avoid computation with loops.

Specifying criteria to extract a new array that satisfies the criteria can be implemented easily with array computation. Here's an example extracting values from array a which satisfies the criteria that that element has an absolute different of less than 0.75 from the corresponding element in array b:-

a = array([1, 0, 0.5, 1.2])

b = array([1.2, 1.1, 1.3, 1.4])

c = a[abs(a-b)<0.75]

Which gives us

array([ 1. ,  1.2])

answered Nov 07 '12 at 15:32

Calvin Cheng

35,640
39
116
167

`arr1` and `arr2` are different sizes in HyperCube's question, so this won't work. – John Vinyard Nov 07 '12 at 15:34
Indeed, both arrays have different sizes unfortunately. Still good to bear this in mind. The goal I'd like to achieve is to find common values which leads to equally sized arrays. – HyperCube Nov 07 '12 at 20:51

Speed up minimum search in Numpy/Python

2 Answers2