2

I'm trying to get the 'logits' out of my Keras CNN classifier. I have tried the suggested method here: link.

First I created two models to check the implementation :

  1. create_CNN_MNIST CNN classifier that returns the softmax probabilities.
  2. create_CNN_MNIST_logits CNN with the same layers as in (1) with a little twist in the last layer - changed the activation function to linear to return logits.

Both models were fed with the same Train and Test data of MNIST. Then I applied softmax on the logits, I got a different output from the softmax CNN.

I couldn't find a problem in my code. Maybe you could help advise another method to extract 'logits' from the model?

the code:

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

def create_CNN_MNIST_logits() :
    model = Sequential()
    model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
    model.add(MaxPooling2D((2, 2)))
    model.add(Flatten())
    model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
    model.add(Dense(10, activation='linear'))
    # compile model
    opt = SGD(learning_rate=0.01, momentum=0.9)
    
    def my_sparse_categorical_crossentropy(y_true, y_pred):
        return keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=True)
    
    model.compile(optimizer=opt, loss=my_sparse_categorical_crossentropy, metrics=['accuracy'])
    return model

def create_CNN_MNIST() :
    model = Sequential()
    model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
    model.add(MaxPooling2D((2, 2)))
    model.add(Flatten())
    model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
    model.add(Dense(10, activation='softmax'))
    # compile model
    opt = SGD(learning_rate=0.01, momentum=0.9)
    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# load data
X_train = np.load('./data/X_train.npy')
X_test = np.load('./data/X_test.npy')
y_train = np.load('./data/y_train.npy')
y_test = np.load('./data/y_test.npy')


#create models
model_softmax = create_CNN_MNIST()
model_logits = create_CNN_MNIST_logits()


pixels = 28
channels = 1
num_labels = 10

# Reshaping to format which CNN expects (batch, height, width, channels)
trainX_cnn = X_train.reshape(X_train.shape[0], pixels, pixels, channels).astype('float32')
testX_cnn = X_test.reshape(X_test.shape[0], pixels, pixels, channels).astype('float32')

# Normalize images from 0-255 to 0-1
trainX_cnn /= 255
testX_cnn /= 255

train_y_cnn = utils.to_categorical(y_train, num_labels)
test_y_cnn = utils.to_categorical(y_test, num_labels)


#train the models:
model_logits.fit(trainX_cnn, train_y_cnn, validation_split=0.2, epochs=10,
                          batch_size=32)
model_softmax.fit(trainX_cnn, train_y_cnn, validation_split=0.2, epochs=10,
                          batch_size=32)

On the evaluation stage, I'll do softmax on the logits to check if its the same as the regular model:

#predict
y_pred_softmax = model_softmax.predict(testX_cnn)
y_pred_logits = model_logits.predict(testX_cnn)

#apply softmax on the logits to get the same result of regular CNN
y_pred_logits_activated = softmax(y_pred_logits)

Now I get different values in both y_pred_logits_activated and y_pred_softmax that lead to different accuracy on the test set.

1 Answers1

1

Your models are probably being trained differently, make sure to set the seed prior to both fit commands so that they're initialised the same weights and have the same train/val split. Also, is the softmax might be incorrect:

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x) 
    return e_x / e_x.sum(axis=1)

This is numerically equivalent to subtracting the max (https://stackoverflow.com/a/34969389/10475762), and the axis should be 1 if your matrix is of shape [batch, outputs].

jhso
  • 3,103
  • 1
  • 5
  • 13