10

I have a large numpy array:

array([[32, 32, 99,  9, 45],  # A
       [99, 45,  9, 45, 32],
       [45, 45, 99, 99, 32],
       [ 9,  9, 32, 45, 99]])

and a large-ish array of unique values in a particular order:

array([ 99, 32, 45, 9])       # B

How can I quickly (no python dictionaries, no copies of A, no python loops) replace the values in A so that become the indicies of the values in B?:

array([[1, 1, 0, 3, 2],
       [0, 2, 3, 2, 1],
       [2, 2, 0, 0, 1],
       [3, 3, 1, 2, 0]])

I feel reaaly dumb for not being able to do this off the top of my head, nor find it in the documentation. Easy points!

Paul
  • 42,322
  • 15
  • 106
  • 123

2 Answers2

8

Here you go

A = array([[32, 32, 99,  9, 45],  # A
   [99, 45,  9, 45, 32],
   [45, 45, 99, 99, 32],
   [ 9,  9, 32, 45, 99]])

B = array([ 99, 32, 45, 9])

ii = np.argsort(B)
C = np.digitize(A.reshape(-1,),np.sort(B)) - 1

Originally I suggested:

D = np.choose(C,ii).reshape(A.shape)

But I realized that that had limitations when you went to larger arrays. Instead, borrowing from @unutbu's clever reply:

D = np.argsort(B)[C].reshape(A.shape)

Or the one-liner

np.argsort(B)[np.digitize(A.reshape(-1,),np.sort(B)) - 1].reshape(A.shape)

Which I found to be faster or slower than @unutbu's code depending on the size of the arrays under consideration and the number of unique values.

JoshAdel
  • 66,734
  • 27
  • 141
  • 140
  • This solution performed moderately faster for my use-case (B.size< – Paul Feb 22 '11 at 01:52
  • I also found unutbu's solution to be generally faster except when B.size << A.size. It's always fun to see multiple solutions and tinker with optimization – JoshAdel Feb 22 '11 at 02:00
7
import numpy as np
A=np.array([[32, 32, 99,  9, 45],  
            [99, 45,  9, 45, 32],
            [45, 45, 99, 99, 32],
            [ 9,  9, 32, 45, 99]])

B=np.array([ 99, 32, 45, 9])

cutoffs=np.sort(B)
print(cutoffs)
# [ 9 32 45 99]

index=cutoffs.searchsorted(A)
print(index)
# [[1 1 3 0 2]
#  [3 2 0 2 1]
#  [2 2 3 3 1]
#  [0 0 1 2 3]]    

index holds the indices into the array cutoff associated with each element of A. Note we had to sort B since np.searchsorted expects a sorted array.

index is almost the desired answer, except that we want to map

1-->1
3-->0
0-->3
2-->2

np.argsort provides us with this mapping:

print(np.argsort(B))
# [3 1 2 0]
print(np.argsort(B)[1])
# 1
print(np.argsort(B)[3])
# 0
print(np.argsort(B)[0])
# 3
print(np.argsort(B)[2])
# 2

print(np.argsort(B)[index])
# [[1 1 0 3 2]
#  [0 2 3 2 1]
#  [2 2 0 0 1]
#  [3 3 1 2 0]]

So, as a one-liner, the answer is:

np.argsort(B)[np.sort(B).searchsorted(A)]

Calling both np.sort(B) and np.argsort(B) is inefficient since both operations amount to sorting B. For any 1D-array B,

np.sort(B) == B[np.argsort(B)]

So we can compute the desired result a bit faster using

key=np.argsort(B)
result=key[B[key].searchsorted(A)]
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677