How to apply Grad-CAM on my trained Model?

Question

I have trained a model to figure out if an image is right or wrong (just 2 classes) and I have used the guide on keras website for GradCAM. The input images are reshaped to (250, 250) and then normalized by dividing the image numpy array by 255. This is then passed for the training of the model. Here is the code attached. I am encountering the following error: Invalid reduction dimension (1 for input with 1 dimension(s) [Op:Mean]

Data

image = cv2.imread("/content/drive/MyDrive/SendO2/Train/correct/droidcam-20210128-152301.jpg")
image = cv2.resize(image, (250, 250))
image = image.astype('float32') / 255
image = np.expand_dims(image, axis=0)

Model

model = Sequential()

#Adding first convolutional layer
model.add(Conv2D(64, (3,3), activation="relu"))

#Adding maxpooling
model.add(MaxPooling2D((2,2)))

#Adding second convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))

#Adding third convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))

#Adding fourth convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))

#Adding fifth convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))

#Flattening the layers
model.add(Flatten())

model.add(Dense(128, input_shape = X.shape[1:], activation="relu"))

#Output Layer. Since, the image is right/wrong, only 2 neurons is needed.
model.add(Dense(2, activation = "softmax"))
# model.add(Dense(2, activation = "sigmoid"))

model.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy", metrics = ["accuracy"])

GradCAM

def get_img_array(img_path, size):
    # `img` is a PIL image of size 299x299
    img = keras.preprocessing.image.load_img(img_path, target_size=size)
    # `array` is a float32 Numpy array of shape (299, 299, 3)
    array = keras.preprocessing.image.img_to_array(img)
    # We add a dimension to transform our array into a "batch"
    # of size (1, 299, 299, 3)
    array = np.expand_dims(array, axis=0)
    print(array.shape)
    return array

def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
    # First, we create a model that maps the input image to the activations
    # of the last conv layer as well as the output predictions
    grad_model = tf.keras.models.Model(
        [model.inputs], [model.get_layer(last_conv_layer_name).output, model.output]
    )

    # Then, we compute the gradient of the top predicted class for our input image
    # with respect to the activations of the last conv layer
    with tf.GradientTape() as tape:
        last_conv_layer_output, preds = grad_model(img_array)
        if pred_index is None:
            pred_index = tf.argmax(preds[0])
        class_channel = preds[:, pred_index]

    # This is the gradient of the output neuron (top predicted or chosen)
    # with regard to the output feature map of the last conv layer
    grads = tape.gradient(class_channel, last_conv_layer_output)

    # This is a vector where each entry is the mean intensity of the gradient
    # over a specific feature map channel
    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
    # pooled_grads = tf.reduce_mean(grads)

    # We multiply each channel in the feature map array
    # by "how important this channel is" with regard to the top predicted class
    # then sum all the channels to obtain the heatmap class activation
    last_conv_layer_output = last_conv_layer_output[0]
    heatmap = last_conv_layer_output @ pooled_grads[..., tf.newaxis]
    heatmap = tf.squeeze(heatmap)

    # For visualization purpose, we will also normalize the heatmap between 0 & 1
    heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
    return heatmap.numpy()

Adjusting parameters

img_size = (250, 250)
preprocess_input = keras.applications.xception.preprocess_input
decode_predictions = keras.applications.xception.decode_predictions

last_conv_layer_name = "dense_1"

# The local path to our target image
img_path =  "/content/drive/MyDrive/SendO2/Train/correct/droidcam-20210128-152301.jpg"

preprocess_input = keras.applications.xception.preprocess_input
decode_predictions = keras.applications.xception.decode_predictions


display(Image(img_path))

Running them

# Prepare image
img_array = preprocess_input(get_img_array(img_path, size=img_size))

# Make model
model = model_builder(weights="imagenet")

# Remove last layer's softmax
model.layers[-1].activation = None

# Print what the top predicted class is
preds = model.predict(img_array)
print("Predicted:", decode_predictions(preds, top=1)[0])

# Generate class activation heatmap
heatmap = make_gradcam_heatmap(img_array, model, last_conv_layer_name)

# Display heatmap
plt.matshow(heatmap)
plt.show()

Here is the error:

I would really be grateful if anyone can help me out here.

Try [this](https://stackoverflow.com/a/66189774/9215780). Let me know if it works for you. — Innat, Apr 02 '21 at 06:39
Hey, thanks for the link but I got this error : ```InvalidArgumentError: Invalid reduction dimension (1 for input with 1 dimension(s) [Op:Mean].``` One thing I have to ask you is that your model has 10 output neurons whereas mine has only 2 output neurons. So, do you think this can affect the code? Error: https://imgur.com/a/Wg9QfOS — Yansh, Apr 02 '21 at 08:25
How you tried on the mentioned link? If possible, please share a colab link. — Innat, Apr 02 '21 at 08:28
https://drive.google.com/drive/folders/149Ax5LUkgJSxhesIqPiJOVMMv6tzAzyx?usp=sharing — Yansh, Apr 02 '21 at 08:37
Sorry for the late reply. Please have a look at the answer. Hope it will help you. — Innat, Apr 04 '21 at 09:53

score 2 · Accepted Answer · answered Apr 04 '21 at 09:52

Here is the complete demo working code. Similar to you, I will be classifying 2 classes with softmax and use the sparse_categorical_crossentropy loss function. There is some issue in your model definition, so I will write my own, don't worry very simple and alike.

To make it useful, I will answer end-to-end, hopefully, new visitors may find it useful too. I will be classifying whether a digit even or odd - a binary classification. And lastly, we will use a raw sample image and preprocess it and try to find a class activation map to see where our model makes its attention. So, here is the content:

Prepare 2 classes data set
Build 2 class classifier and train it
Find Grad-CAM

Data Sets

We will use MNIST and modify it for binary classes - even number or odd number.

import tensorflow as tf 

(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()

# x set 
x_train = tf.expand_dims(x_train, axis=-1)
x_train = tf.divide(x_train, 255)
x_train = tf.image.resize(x_train, [84,84]) 

# y set 
# odd : 0 ; even : 1
y_train = (y_train % 2 == 0).astype(int)

print(x_train.shape, y_train.shape)
(60000, 84, 84, 1) (60000,)

Model

from tensorflow.keras import Sequential
from tensorflow.keras.layers import (Conv2D, MaxPooling2D, Flatten, Dropout,
                                     Dense, GlobalAveragePooling2D)

model = Sequential()

#Adding first convolutional layer
model.add(Conv2D(16, kernel_size=(3,3), input_shape = (84,84,1)))  
model.add(Conv2D(32, kernel_size=(3,3), activation="relu"))  
model.add(Conv2D(64, kernel_size=(3,3), activation="relu"))  
model.add(Conv2D(128, kernel_size=(3,3), activation="relu"))  
model.add(GlobalAveragePooling2D())   
model.add(Dropout(0.5))         
model.add(Dense(2, activation=tf.nn.softmax))       
model.summary()

model.compile(optimizer = "adam", 
              loss = "sparse_categorical_crossentropy", 
              metrics = ["accuracy"])

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_16 (Conv2D)           (None, 82, 82, 16)        160       
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 80, 80, 32)        4640      
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 78, 78, 64)        18496     
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 76, 76, 128)       73856     
_________________________________________________________________
global_average_pooling2d_3 ( (None, 128)               0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 258       
=================================================================
Total params: 97,410
Trainable params: 97,410
Non-trainable params: 0

Note that, for Grad-CAM we will be using layer conv2d_19 - right before the GAP layer. (You may choose another layer that gives 2D feature maps). Ok, training the model:

model.fit(x_train, y_train, epochs=20, batch_size=256, verbose=2)

Epoch 1/20
235/235 - 32s - loss: 0.5549 - accuracy: 0.7203
Epoch 2/20
235/235 - 29s - loss: 0.4586 - accuracy: 0.7890
Epoch 3/20
235/235 - 29s - loss: 0.4160 - accuracy: 0.8132
Epoch 4/20
235/235 - 29s - loss: 0.4000 - accuracy: 0.8226
Epoch 5/20
235/235 - 29s - loss: 0.3830 - accuracy: 0.8304
Epoch 6/20
235/235 - 29s - loss: 0.3690 - accuracy: 0.8367
Epoch 7/20
235/235 - 29s - loss: 0.3619 - accuracy: 0.8421
...
...

Grad-CAM

We will be using exact same GradCAM classes from my other answer, here. No differences. After that, we will load a sample image:

import matplotlib.pyplot as plt

image = cv2.imread('/content/5.png', 0)
image = cv2.bitwise_not(image)          # ATTENTION 
image = cv2.resize(image, (84, 84))

# checking how it looks 
plt.imshow(image, cmap="gray")
plt.show()

image = tf.expand_dims(image, axis=-1)     # from 84 x 84 to 84 x 84 x 1 
image = tf.divide(image, 255)              # normalize
image = tf.reshape(image, [1, 84, 84, 1])  # reshape to add batch dimension

print(image.shape) # (1, 84, 84, 1)

Note: You may wonder why I am using image = cv2.bitwise_not(image) this. Because for me, my sample looks as follows (background white and foreground black), which is not look like the data set (MNIST) on which the model is trained on. That's why I had to use the bitwise_not operation on my sample - but you may don't need to do this for your case. For more details, please see my another answer.

Ok, we know what is this image. And I also added some lines to the image right below to see what model reacts on them VS the main ROI. Let's see what the model would say, whether it's an even number (1) or an odd number (0).

preds = model.predict(image) 
i = np.argmax(preds[0])
i # 0 - great model correctly recognize, this is an odd number

Finding class activation maps

# `conv2d_19` - remember this, we talked about it earlier 
icam = GradCAM(model, i, 'conv2d_19') 
heatmap = icam.compute_heatmap(image)
heatmap = cv2.resize(heatmap, (84, 84))

image = cv2.imread('/content/5.png')
image = cv2.resize(image, (84, 84))
print(heatmap.shape, image.shape)

(heatmap, output) = icam.overlay_heatmap(heatmap, image, alpha=0.5)

Visualize

fig, ax = plt.subplots(1, 3)
fig.set_size_inches(20,20)

ax[0].imshow(heatmap)
ax[1].imshow(image)
ax[2].imshow(output)

You're using Guided Grad-CAM, which is not the same as Grad-CAM. In a paper called Sanity Checks for Saliency Maps, Adebayo et al show that Guided Grad-CAM is simply a function of the input and therefore doesn't show what the model has learnt. This problem doesn't affect Grad-CAM. Moreover, input-gradients in general have been argued as being very easy to shift arbitrarily, meaning that any saliency method using input-gradients are not actually reflecting the model's learnt behaviour. In other words, Guided Backpropagation and Guided Grad-CAM are a big no no these day. — kelkka, Apr 20 '21 at 13:53

How to apply Grad-CAM on my trained Model?

1 Answers1

Data Sets

Model

Grad-CAM