0

Can some one please guide me that what is meant by conversion from scalar to one hot? and what is the purpose of labels_dense.shape[0] and at the end why label one hot.flat is equal to one? def dense_to_one_hot(labels_dense, num_classes=10): """Convert class labels from scalars to one-hot vectors""" num_labels = labels_dense.shape[0] index_offset = np.arange(num_labels) * num_classes labels_one_hot = np.zeros((num_labels, num_classes)) labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1 return labels_one_hot

eman
  • 47
  • 5

2 Answers2

0

I think you might find this answer describing how one hot encoding works in machine learning helpful: One Hot Encoding for Machine learning

Community
  • 1
  • 1
Pete Warden
  • 2,866
  • 1
  • 13
  • 12
0

I came across that same function and wrote a simpler one to understand. I am using the digits 0 to 4 which represent 5 classes.

What is the purpose of labels_dense.shape[0] ?

It returns the number of labels which is '10' in this example.

What does this code mean ?

labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1

It is logic to put '1' in the correct position as you can see in the output. It is just calculating the position from the beginning of the entire one-hot representation.

So to represent the digit '0' as a one-hot vector the 45th position should be'1'. This corresponds to the 0th element of the last vector.

So

[1. 0. 0. 0. 0.]

is the one-hot representation for the digit '0' when we have 5 classes.

def onehot():

    labels_dense = numpy.array([1,2,3,4,3,4,3,2,1,0])

    print('Shape of labels_dense is ' + str(labels_dense.shape))

    index_offset = numpy.arange(10) * 5

    print('Index offset is \n' + str(index_offset))

    labels_one_hot = numpy.zeros((10, 5))

    print('index_offset + labels_dense.ravel() is\n' + str(index_offset + labels_dense.ravel()))

    labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1

    print('One-hot labels are ' + str(labels_one_hot))

Output is this.

Shape of labels_dense is (10,)
Index offset is 
[ 0  5 10 15 20 25 30 35 40 45]
index_offset + labels_dense.ravel() is
[ 1  7 13 19 23 29 33 37 41 45]
One-hot labels are 

[[0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]
 [0. 0. 0. 1. 0.]
 [0. 0. 1. 0. 0.]
 [0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0.]]
Mohan Radhakrishnan
  • 3,002
  • 5
  • 28
  • 42