1

I have an image classifier with 2 outputs. I'm trying to check the most important pixels in the image which if perturbed may impact the model output (pixel that has most influence on model output).

My last layer is as as follows:

model.add(Dense(2, activation="softmax"))
model.compile(loss="categorical_crossentropy")

So the model has 2 outputs y1, y2. For an input image x=x0, I am trying to compute dy1/dx|x=x0, dy2/dx|x=x0

I have the following questions:

a) Can I compute the gradients by utilising the softmax output or should I use the logits of the model.

b) Now for an image x0, its prediction is y1 (i.e y1 > y2). After I compute the gradients, I get two vectors (same size as input image) corresponding to dy1/dx|x=x0 and dy2/dx|x=x0. How do I use these to identify the pixel that has most influence on model output (something that works even when number of model output classes is > 2)

c) How are these gradients different from gradient of loss with respect to input.

Would appreciate any clarification/code on this.

ml-user
  • 11
  • 1

0 Answers0