1

I have two floating arrays and want to find data points which match within a certain range. This is what I got so far:

import numpy as np

for vx in range(len(arr1)):
    match = (np.abs(arr2-arr1[vx])).argmin()
    if abs(arr1[vx]-arr2[match])<0.375:
        point = arr2[match]

The problem is that arr1 contains 150000 elements and arr2 around 110000 elements. This takes an awful amount of time. Do you have suggestions to speed things up?

HyperCube
  • 3,870
  • 9
  • 41
  • 53
  • 3
    Have you tried [profiling it](http://stackoverflow.com/questions/3927628/how-can-i-profile-python-code-line-by-line)? – huon Nov 07 '12 at 15:21

2 Answers2

3

In addition to not being vectorized, your current search is (n * m) where n is the size of arr2 and m is the size of arr1. In these kinds of searches it helps to sort arr1 or arr2 so you can use a binary search. Sorting ends up being the slowest step but it's still faster if m is large because the n*log(n) sort is faster than (n*m).

Here is how you can do the search in a vectorized way using the sorted array:

def find_closest(A, target):
    #A must be sorted
    idx = A.searchsorted(target)
    idx = np.clip(idx, 1, len(A)-1)
    left = A[idx-1]
    right = A[idx]
    idx -= target - left < right - target
    return A[idx]

arr2.sort()
closest = find_closest(arr2, arr1)
closest = np.where(abs(closest - arr1) < .375, closest, np.nan)
Bi Rico
  • 25,283
  • 3
  • 52
  • 75
  • Why is searchsorted not enough? I am not quite understanding what happens within the find_closest() function. However it works fine! – HyperCube Nov 12 '12 at 09:34
  • 1
    I wrote a more complete explanation of how `find_closest` works in an answer tho this question, http://stackoverflow.com/questions/8914491/finding-the-nearest-value-and-return-the-index-of-array-in-python/8929827#8929827 , but the reason that searchsorted is not enough is because it returns an such that `target <= A[index]`. That is if you do `searchsorted([0, 1], [0, .2, .9, 1.]) == [0, 1, 1, 1])`. But `.2` is closer to 0 than to 1 so we want `[0, 0, 1, 1]`. – Bi Rico Nov 12 '12 at 16:51
0

The whole idea of using numpy is to avoid computation with loops.

Specifying criteria to extract a new array that satisfies the criteria can be implemented easily with array computation. Here's an example extracting values from array a which satisfies the criteria that that element has an absolute different of less than 0.75 from the corresponding element in array b:-

a = array([1, 0, 0.5, 1.2])

b = array([1.2, 1.1, 1.3, 1.4])

c = a[abs(a-b)<0.75]

Which gives us

array([ 1. ,  1.2])
Calvin Cheng
  • 35,640
  • 39
  • 116
  • 167
  • `arr1` and `arr2` are different sizes in HyperCube's question, so this won't work. – John Vinyard Nov 07 '12 at 15:34
  • Indeed, both arrays have different sizes unfortunately. Still good to bear this in mind. The goal I'd like to achieve is to find common values which leads to equally sized arrays. – HyperCube Nov 07 '12 at 20:51