9

When I was working on my machine learning project, I was looking for a line of code to turn my labels into one-hot vectors. I came across this nifty line of code from u/benanne on Reddit.

np.eye(n_labels)[target_vector]

For example, for a target_vector = np.array([1, 4, 2, 1, 0, 1, 3, 2]), it returns the one-hot coded values:

np.eye(5)[target_vector]
Out: 
array([[ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  0.,  0.],
       ..., 
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  0.,  0.]])

While it definitely does work, I'm not sure how or why it works.

ayhan
  • 70,170
  • 20
  • 182
  • 203
Suha Hussain
  • 101
  • 2
  • I've never seen a method this elegant for generating OHVs. – cs95 Jul 12 '17 at 23:12
  • It's the second answer on the [top voted SO question.](https://stackoverflow.com/questions/29831489/numpy-1-hot-array). Should probably be the accepted answer since it handles n-d 1HA's so well – Daniel F Jul 13 '17 at 06:58

2 Answers2

9

It's rather simple. np.eye(n_labels) creates an identity matrix of size n_labels then you use your target_vector to select rows, corresponding to the value of the current target, from that matrix. Since each row in an identity matrix contains exactly one 1 element and the rest 0, each row will be a valid 'one hot coding'.

JohanL
  • 6,671
  • 1
  • 12
  • 26
  • This is indexing a NumPy array using another array as described here: https://docs.scipy.org/doc/numpy/user/basics.indexing.html#index-arrays. The first array is the Identity matrix of size _n_labels_. The second array selects the one-hot row corresponding to each target. – zardosht Oct 11 '18 at 10:04
0

ndarray[[0]] is to select the first line in the ndarray

t = np.arange(9).reshape(3,3)
print t
print t[[1]]

Output is

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[3 4 5]]
Jason
  • 3,166
  • 3
  • 20
  • 37