I start with an array a
containing N
unique values (product(a.shape) >= N
).
I need to find the array b
that has the index 0 .. N-1
from the (sorted) list of unique values in a
at the positions of the respective elements in a
.
As an example
import numpy as np
np.random.seed(42)
a = np.random.choice([0.1,1.3,7,9.4], size=(4,3))
print a
prints a
as
[[ 7. 9.4 0.1]
[ 7. 7. 9.4]
[ 0.1 0.1 7. ]
[ 1.3 7. 7. ]]
The unique values are [0.1, 1.3, 7.0, 9.4]
, so the required outcome b
would be
[[2 3 0]
[2 2 3]
[0 0 2]
[1 2 2]]
(e.g. the value at a[0,0]
is 7.
; 7.
has the index 2
; thus b[0,0] == 2
.)
Since numpy does not have an index function, I could do this using a loop. Either looping over the input array, like this:
u = np.unique(a).tolist()
af = a.flatten()
b = np.empty(len(af), dtype=int)
for i in range(len(af)):
b[i] = u.index(af[i])
b = b.reshape(a.shape)
print b
or looping over the unique values as follows:
u = np.unique(a)
b = np.empty(a.shape, dtype=int)
for i in range(len(u)):
b[np.where(a == u[i])] = i
print b
I suppose that the second way of looping over the unique values is already more efficient than the first in cases where not all values in a
are distinct; but still, it involves this loop and is rather inefficient compared to inplace operations.
So my question is: What is the most efficient way of obtaining the array b
filled with the indizes of the unique values of a
?