2

I have an array of values that I want to replace with from an array of choices based on which choice is linearly closest.

The catch is the size of the choices is defined at runtime.

import numpy as np
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])

If choices was static in size, I would simply use np.where

d = np.where(np.abs(a - choices[0]) > np.abs(a - choices[1]), 
      np.where(np.abs(a - choices[0]) > np.abs(a - choices[2]), choices[0], choices[2]),
         np.where(np.abs(a - choices[1]) > np.abs(a - choices[2]), choices[1], choices[2]))

To get the output:

>>d
>>[[1, 1, 1], [5, 5, 5], [10, 10, 10]]

Is there a way to do this more dynamically while still preserving the vectorization.

Divakar
  • 218,885
  • 19
  • 262
  • 358
dranobob
  • 796
  • 1
  • 5
  • 19
  • 1
    Subtract ```choices``` from ```a``` find the index of the minimum of the result, substitute. – wwii Nov 21 '16 at 04:30

3 Answers3

3

Subtract choices from a, find the index of the minimum of the result, substitute.

a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a

>>> 
[[ 1  1  1]
 [ 5  5  5]
 [10 10 10]]

a = np.array([[0, 3, 0], [4, 8, 4], [9, 1, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a

>>>    
[[ 1  1  1]
 [ 5 10  5]
 [10  1 10]]
>>> 

The extra dimension was added to a so that each element of choices would be subtracted from each element of a. choices was broadcast against a in the third dimension, This link has a decent graphic. b.shape is (3,3,3). EricsBroadcastingDoc is a pretty good explanation and has a graphic 3-d example at the end.

For the second example:

>>> print b
[[[ 1  5 10]
  [ 2  2  7]
  [ 1  5 10]]

 [[ 3  1  6]
  [ 7  3  2]
  [ 3  1  6]]

 [[ 8  4  1]
  [ 0  4  9]
  [ 8  4  1]]]
>>> print i
[[0 0 0]
 [1 2 1]
 [2 0 2]]
>>> 

The final assignment uses an Index Array or Integer Array Indexing.

In the second example, notice that there was a tie for element a[0,1] , either one or five could have been substituted.

wwii
  • 23,232
  • 7
  • 37
  • 77
  • so to see if I'm understanding the idea is make choices the same shape as A? so what if A was something like a 10x10? – dranobob Nov 21 '16 at 05:02
  • @dranobob, see edit - hopefully that clears it up. If ```a``` was originally (10,10) then ```b``` would have been (1010,3). We created a third dimension for ```a``` that ```choices``` could be broadcast against. – wwii Nov 21 '16 at 05:09
2

To explain wwii's excellent answer in a little more detail:

The idea is to create a new dimension which does the job of comparing each element of a to each element in choices using numpy broadcasting. This is easily done for an arbitrary number of dimensions in a using the ellipsis syntax:

>>> b = np.abs(a[..., np.newaxis] - choices)
array([[[ 1,  5, 10],
        [ 1,  5, 10],
        [ 1,  5, 10]],
       [[ 3,  1,  6],
        [ 3,  1,  6],
        [ 3,  1,  6]],
       [[ 8,  4,  1],
        [ 8,  4,  1],
        [ 8,  4,  1]]])

Taking argmin along the axis you just created (the last axis, with label -1) gives you the desired index in choices that you want to substitute:

>>> np.argmin(b, axis=-1)
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

Which finally allows you to choose those elements from choices:

>>> d = choices[np.argmin(b, axis=-1)]
>>> d
array([[ 1,  1,  1],
       [ 5,  5,  5],
       [10, 10, 10]])

For a non-symmetric shape:

Let's say a had shape (2, 5):

>>> a = np.arange(10).reshape((2, 5))
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

Then you'd get:

>>> b = np.abs(a[..., np.newaxis] - choices)
>>> b
array([[[ 1,  5, 10],
        [ 0,  4,  9],
        [ 1,  3,  8],
        [ 2,  2,  7],
        [ 3,  1,  6]],

       [[ 4,  0,  5],
        [ 5,  1,  4],
        [ 6,  2,  3],
        [ 7,  3,  2],
        [ 8,  4,  1]]])

This is hard to read, but what it's saying is, b has shape:

>>> b.shape
(2, 5, 3)

The first two dimensions came from the shape of a, which is also (2, 5). The last dimension is the one you just created. To get a better idea:

>>> b[:, :, 0]  # = abs(a - 1)
array([[1, 0, 1, 2, 3],
       [4, 5, 6, 7, 8]])
>>> b[:, :, 1]  # = abs(a - 5)
array([[5, 4, 3, 2, 1],
       [0, 1, 2, 3, 4]])
>>> b[:, :, 2]  # = abs(a - 10)
array([[10,  9,  8,  7,  6],
       [ 5,  4,  3,  2,  1]])

Note how b[:, :, i] is the absolute difference between a and choices[i], for each i = 1, 2, 3.

Hope that helps explain this a little more clearly.

Community
  • 1
  • 1
Praveen
  • 6,872
  • 3
  • 43
  • 62
  • How would this change if the dimensions were way off, like A was 17 x 7? – dranobob Nov 21 '16 at 05:05
  • @dranobob It won't change at all. For extremely large array sizes, your program could take up a lot of memory. But for 17x7, you wouldn't need to worry – Praveen Nov 21 '16 at 05:07
  • 1
    @dranobob In fact, it wouldn't even change if your `a` matrix had shape (4,) or (12, 7, 8) or anything like that, since the ellipsis takes care, no matter how many dimensions `a` has. – Praveen Nov 21 '16 at 05:09
  • This is very cool, but I'm still trying to wrap my head around vectorization concepts. Can you help me visualize how such an odd A matrix would work in this setup. I can easily picture the 3x3 – dranobob Nov 21 '16 at 05:09
  • I'm going to keep staring at this till it make sense. I've been coding all weekend so its possible I might need to take a stab at it in the morning :) – dranobob Nov 21 '16 at 05:27
2

I love broadcasting and would have gone that way myself too. But, with large arrays, I would like to suggest another approach with np.searchsorted that keeps it memory efficient and thus achieves performance benefits, like so -

def searchsorted_app(a, choices):
    lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
    ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
    cl = np.take(choices,lidx) # Or choices[lidx]
    cr = np.take(choices,ridx) # Or choices[ridx]
    mask = np.abs(a - cl) > np.abs(a - cr)
    cl[mask] = cr[mask]
    return cl

Please note that if the elements in choices are not sorted, we need to add in the additional argument sorter with np.searchsorted.

Runtime test -

In [160]: # Setup inputs
     ...: a = np.random.rand(100,100)
     ...: choices = np.sort(np.random.rand(100))
     ...: 

In [161]: def broadcasting_app(a, choices): # @wwii's solution
     ...:     return choices[np.argmin(np.abs(a[:,:,None] - choices),-1)]
     ...: 

In [162]: np.allclose(broadcasting_app(a,choices),searchsorted_app(a,choices))
Out[162]: True

In [163]: %timeit broadcasting_app(a, choices)
100 loops, best of 3: 9.3 ms per loop

In [164]: %timeit searchsorted_app(a, choices)
1000 loops, best of 3: 1.78 ms per loop

Related post : Find elements of array one nearest to elements of array two

Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • In the case of a tie (integer arrays), your's favors the *right* index while mine favors the *left*. OP didn't comment when I pointed out the tie behavior. I guess if the arrays are floats the likelihood of ties would be minimal. – wwii Nov 21 '16 at 15:59
  • Interesting, it seems the performance difference varies with the size of ```choices``` but not with the size of ```a```. – wwii Nov 21 '16 at 16:05
  • @wwii Ties with the indices? I guess with the final indexing step of `choices[i]`, the selected values would be the same either way, right? Hmm, well I took the shapes representative of what OP had in sample, i.e. `choices.size = n` and `a.shape = (n,n)`. – Divakar Nov 21 '16 at 16:57
  • In my second example, ```a[0,1]``` is three which is equidistant from two of the choices, one and five. That is what I am calling a tie, Either one or five could be substituted per the OP's spec -both are correct. Your method chooses five and mine chooses one. But this probably only happens when the inputs are integer arrays. I like the way yours works, I'm certainly not in the habit of thinking that way. – wwii Nov 21 '16 at 17:34
  • 1
    @wwii Ah I see now! Well OP needs to take a call on those cases then because not specified . – Divakar Nov 21 '16 at 17:36