How to select value from array that is closest to value in array using vectorization?

Question

I have an array of values that I want to replace with from an array of choices based on which choice is linearly closest.

The catch is the size of the choices is defined at runtime.

import numpy as np
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])

If choices was static in size, I would simply use np.where

d = np.where(np.abs(a - choices[0]) > np.abs(a - choices[1]), 
      np.where(np.abs(a - choices[0]) > np.abs(a - choices[2]), choices[0], choices[2]),
         np.where(np.abs(a - choices[1]) > np.abs(a - choices[2]), choices[1], choices[2]))

To get the output:

>>d
>>[[1, 1, 1], [5, 5, 5], [10, 10, 10]]

Is there a way to do this more dynamically while still preserving the vectorization.

Subtract ```choices``` from ```a``` find the index of the minimum of the result, substitute. — wwii, Nov 21 '16 at 04:30

wwii · Accepted Answer · 2016-11-21T15:30:32.047

Subtract choices from a, find the index of the minimum of the result, substitute.

a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a

>>> 
[[ 1  1  1]
 [ 5  5  5]
 [10 10 10]]

a = np.array([[0, 3, 0], [4, 8, 4], [9, 1, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a

>>>    
[[ 1  1  1]
 [ 5 10  5]
 [10  1 10]]
>>>

The extra dimension was added to a so that each element of choices would be subtracted from each element of a. choices was broadcast against a in the third dimension, This link has a decent graphic. b.shape is (3,3,3). EricsBroadcastingDoc is a pretty good explanation and has a graphic 3-d example at the end.

For the second example:

>>> print b
[[[ 1  5 10]
  [ 2  2  7]
  [ 1  5 10]]

 [[ 3  1  6]
  [ 7  3  2]
  [ 3  1  6]]

 [[ 8  4  1]
  [ 0  4  9]
  [ 8  4  1]]]
>>> print i
[[0 0 0]
 [1 2 1]
 [2 0 2]]
>>>

The final assignment uses an Index Array or Integer Array Indexing.

In the second example, notice that there was a tie for element a[0,1] , either one or five could have been substituted.

so to see if I'm understanding the idea is make choices the same shape as A? so what if A was something like a 10x10? — dranobob, Nov 21 '16 at 05:02
@dranobob, see edit - hopefully that clears it up. If ```a``` was originally (10,10) then ```b``` would have been (1010,3). We created a third dimension for ```a``` that ```choices``` could be broadcast against. — wwii, Nov 21 '16 at 05:09

score 2 · Answer 2 · edited May 23 '17 at 11:54

To explain wwii's excellent answer in a little more detail:

The idea is to create a new dimension which does the job of comparing each element of a to each element in choices using numpy broadcasting. This is easily done for an arbitrary number of dimensions in a using the ellipsis syntax:

>>> b = np.abs(a[..., np.newaxis] - choices)
array([[[ 1,  5, 10],
        [ 1,  5, 10],
        [ 1,  5, 10]],
       [[ 3,  1,  6],
        [ 3,  1,  6],
        [ 3,  1,  6]],
       [[ 8,  4,  1],
        [ 8,  4,  1],
        [ 8,  4,  1]]])

Taking argmin along the axis you just created (the last axis, with label -1) gives you the desired index in choices that you want to substitute:

>>> np.argmin(b, axis=-1)
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

Which finally allows you to choose those elements from choices:

>>> d = choices[np.argmin(b, axis=-1)]
>>> d
array([[ 1,  1,  1],
       [ 5,  5,  5],
       [10, 10, 10]])

For a non-symmetric shape:

Let's say a had shape (2, 5):

>>> a = np.arange(10).reshape((2, 5))
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

Then you'd get:

>>> b = np.abs(a[..., np.newaxis] - choices)
>>> b
array([[[ 1,  5, 10],
        [ 0,  4,  9],
        [ 1,  3,  8],
        [ 2,  2,  7],
        [ 3,  1,  6]],

       [[ 4,  0,  5],
        [ 5,  1,  4],
        [ 6,  2,  3],
        [ 7,  3,  2],
        [ 8,  4,  1]]])

This is hard to read, but what it's saying is, b has shape:

>>> b.shape
(2, 5, 3)

The first two dimensions came from the shape of a, which is also (2, 5). The last dimension is the one you just created. To get a better idea:

>>> b[:, :, 0]  # = abs(a - 1)
array([[1, 0, 1, 2, 3],
       [4, 5, 6, 7, 8]])
>>> b[:, :, 1]  # = abs(a - 5)
array([[5, 4, 3, 2, 1],
       [0, 1, 2, 3, 4]])
>>> b[:, :, 2]  # = abs(a - 10)
array([[10,  9,  8,  7,  6],
       [ 5,  4,  3,  2,  1]])

Note how b[:, :, i] is the absolute difference between a and choices[i], for each i = 1, 2, 3.

Hope that helps explain this a little more clearly.

How would this change if the dimensions were way off, like A was 17 x 7? — dranobob, Nov 21 '16 at 05:05
@dranobob It won't change at all. For extremely large array sizes, your program could take up a lot of memory. But for 17x7, you wouldn't need to worry — Praveen, Nov 21 '16 at 05:07
@dranobob In fact, it wouldn't even change if your `a` matrix had shape (4,) or (12, 7, 8) or anything like that, since the ellipsis takes care, no matter how many dimensions `a` has. — Praveen, Nov 21 '16 at 05:09
This is very cool, but I'm still trying to wrap my head around vectorization concepts. Can you help me visualize how such an odd A matrix would work in this setup. I can easily picture the 3x3 — dranobob, Nov 21 '16 at 05:09
I'm going to keep staring at this till it make sense. I've been coding all weekend so its possible I might need to take a stab at it in the morning :) — dranobob, Nov 21 '16 at 05:27

score 2 · Answer 3 · edited May 23 '17 at 12:25

2

I love broadcasting and would have gone that way myself too. But, with large arrays, I would like to suggest another approach with np.searchsorted that keeps it memory efficient and thus achieves performance benefits, like so -

def searchsorted_app(a, choices):
    lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
    ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
    cl = np.take(choices,lidx) # Or choices[lidx]
    cr = np.take(choices,ridx) # Or choices[ridx]
    mask = np.abs(a - cl) > np.abs(a - cr)
    cl[mask] = cr[mask]
    return cl

Please note that if the elements in choices are not sorted, we need to add in the additional argument sorter with np.searchsorted.

Runtime test -

In [160]: # Setup inputs
     ...: a = np.random.rand(100,100)
     ...: choices = np.sort(np.random.rand(100))
     ...: 

In [161]: def broadcasting_app(a, choices): # @wwii's solution
     ...:     return choices[np.argmin(np.abs(a[:,:,None] - choices),-1)]
     ...: 

In [162]: np.allclose(broadcasting_app(a,choices),searchsorted_app(a,choices))
Out[162]: True

In [163]: %timeit broadcasting_app(a, choices)
100 loops, best of 3: 9.3 ms per loop

In [164]: %timeit searchsorted_app(a, choices)
1000 loops, best of 3: 1.78 ms per loop

Related post : Find elements of array one nearest to elements of array two

edited May 23 '17 at 12:25

Community

1
1

answered Nov 21 '16 at 09:15

Divakar

218,885
19
262
358

In the case of a tie (integer arrays), your's favors the *right* index while mine favors the *left*. OP didn't comment when I pointed out the tie behavior. I guess if the arrays are floats the likelihood of ties would be minimal. – wwii Nov 21 '16 at 15:59
Interesting, it seems the performance difference varies with the size of ```choices``` but not with the size of ```a```. – wwii Nov 21 '16 at 16:05
@wwii Ties with the indices? I guess with the final indexing step of `choices[i]`, the selected values would be the same either way, right? Hmm, well I took the shapes representative of what OP had in sample, i.e. `choices.size = n` and `a.shape = (n,n)`. – Divakar Nov 21 '16 at 16:57
In my second example, ```a[0,1]``` is three which is equidistant from two of the choices, one and five. That is what I am calling a tie, Either one or five could be substituted per the OP's spec -both are correct. Your method chooses five and mine chooses one. But this probably only happens when the inputs are integer arrays. I like the way yours works, I'm certainly not in the habit of thinking that way. – wwii Nov 21 '16 at 17:34
1

@wwii Ah I see now! Well OP needs to take a call on those cases then because not specified . – Divakar Nov 21 '16 at 17:36

How to select value from array that is closest to value in array using vectorization?

3 Answers3

Linked