5

When I load the mnist dataset from Keras, I get 4 variables -

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

The shape of x_train is (60000, 28, 28), which makes sense because it contains 60,000 28x28 pictures

The shape of the y_train is just (60000,) which shows that it is a one-dimensional vector which contains numeric target labels (0-9).

In order to run digit classification, neural networks generally output a one-hot encoded vector, which would have ten dimensions. I thought I needed to use to_categorical to convert the y target from numerical to categorical in order to have the shape output of the neural net match the training samples, which would presumably be (60000, 10).

But in a few examples I've found online, to_categorical was never used to reshape the training vector. y_train.shape remained (60000,) while the neural net's output layer was

 model.add(Dense(10, activation="softmax"))

which outputs a 10-D one-hot vector.

And then they simply trained the model on y_train without issue

model.fit(x_train, y_train, epochs=2, validation_data=(x_test, y_test))

How is this possible? Wouldn't the neural net's output, which would be in the shape (60000, 10) be incompatible with (60000,)? Or does Keras automatically convert the categorical output to numeric?

EDIT: To be extra clear, I know how to one-hot encode it, but my question is why they didn't do that. In the example, the net worked without one-hot encoding the target classes, while the net's output was clearly one-hot encoded.

EDIT: Roshin was right. This is simply an effect of using the sparse_crossentropy loss, as opposed to categorical.

user3576467
  • 424
  • 1
  • 7
  • 10

2 Answers2

4

Change the loss function to

loss = 'sparse_categorical_crossentropy'

This will work, and you don't have to change the input data shape

Roshin Raphel
  • 2,612
  • 4
  • 22
  • 40
  • You're right. But why? What's the difference between categorical and sparse? I know that's been answered before but why would it affect data shape? – user3576467 Jun 20 '20 at 20:56
  • 1
    I think this question will answer your doubts : https://stackoverflow.com/questions/44674847/what-are-the-differences-between-all-these-cross-entropy-losses-in-keras-and-ten – Roshin Raphel Jun 20 '20 at 20:58
0

You can convert it to one-hot yourself by executing these lines of code:

(x_train, l_train), (x_test, l_test) = mnist.load_data()
y_train = np.zeros((l_train.shape[0], l_train.max()+1), dtype=np.float32)
y_train[np.arange(l_train.shape[0]), l_train] = 1
y_test = np.zeros((l_test.shape[0], l_test.max()+1), dtype=np.float32)
y_test[np.arange(l_test.shape[0]), l_test] = 1
SELLAM
  • 71
  • 1
  • 4
  • I know how to one-hot encode it, but my question is why they didn't do that. In the example, the net worked without one-hot encoding the target classes, while the net's output was clearly one-hot encoded – user3576467 Jun 20 '20 at 20:41
  • 2
    May be they trained the model using sparse_categorical_crossentropy instead of categorical_crossentropy. sparse_categorical_crossentropy is used if your target labels are integers to be compared with softmax one-hot outputs – SELLAM Jun 20 '20 at 20:57