When I load the mnist
dataset from Keras, I get 4 variables -
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
The shape of x_train
is (60000, 28, 28)
, which makes sense because it contains 60,000 28x28 pictures
The shape of the y_train
is just (60000,)
which shows that it is a one-dimensional vector which contains numeric target labels (0-9).
In order to run digit classification, neural networks generally output a one-hot encoded vector, which would have ten dimensions. I thought I needed to use to_categorical
to convert the y target from numerical to categorical in order to have the shape output of the neural net match the training samples, which would presumably be (60000, 10)
.
But in a few examples I've found online, to_categorical
was never used to reshape the training vector. y_train.shape
remained (60000,)
while the neural net's output layer was
model.add(Dense(10, activation="softmax"))
which outputs a 10-D one-hot vector.
And then they simply trained the model on y_train
without issue
model.fit(x_train, y_train, epochs=2, validation_data=(x_test, y_test))
How is this possible? Wouldn't the neural net's output, which would be in the shape (60000, 10)
be incompatible with (60000,)
? Or does Keras automatically convert the categorical output to numeric?
EDIT: To be extra clear, I know how to one-hot encode it, but my question is why they didn't do that. In the example, the net worked without one-hot encoding the target classes, while the net's output was clearly one-hot encoded.
EDIT: Roshin was right. This is simply an effect of using the sparse_crossentropy
loss, as opposed to categorical.