Saliency maps of neural networks (using Keras)

Question

I have a fully connected multilayer perceptron trained in Keras. I feed it an N-dimensional feature vector and it predicts one out of M classes for the input vector. The training and prediction is working well. Now I want to analyze what part of the input feature vector is actually responsible for a particular class.
For example, lets say there are two classes A and B , and an input vector f. The vector f belongs to class A and the network predicts it correctly - the output of the network is A=1 B=0. Because I have some domain knowledge, I know that the entire f is actually not responsible for f belonging to A, only a certain part inside f is responsible for that. I want to know if the neural network has captured that. Drawing a correspondence to images, if an image I has a cat in it (with some grassy background) and a trained network predicts that correctly, then the network must know that the entire image is actually not a cat; the network internally knows the location of the cat in the image. Similarly, in my case, the network knows what part of f makes it belong to class A. I want to know what part that is.

I searched around, and believe what I want to do is called finding Saliency Maps for my network, for a given input. Is that correct?
If I've understood it correctly, Saliency Maps are simply (change in output)/(change in input), and can be found by simply 1 backpropagation operation where I find the derivative of output with respect to the input.
I found the following code snippet for doing this in Keras, but I'm not really sure if it is correct:

   inp = model.layers[0].get_input()
   outp = model.layers[-1].get_output()
   max_outp = T.max(outp, axis=1)
   saliency = theano.grad(max_outp.sum(), wrt=inp)

In the above code, when computing the gradient, is the backpropagation actually happening? The output is a non-linear function of the input, so the only way to find the gradient is to do backprop. But in the above code, there is nothing to connect theano and the network, how is theano "aware" of the network here? As far as I know, when computing gradients with Theano, we first define the function in terms of input and output. So theano has to know what that non-linear function is. I don't think that is true in the above snippet..

Update: The above code doesn't work because I have a fully connected MLP. It gives an error saying "Dense object doesn't have get_output()" . I have the following Keras function, which computes output of network given input. I want to now find gradient of this function wrt the input:

    get_output = K.function([self.model.layers[0].input],[self.model.layers[-1].output])

Which version of Keras do you use? – Marcin Możejko May 01 '16 at 21:13 — Marcin Możejko, May 01 '16 at 21:13

score 7 · Accepted Answer · answered May 01 '16 at 23:40

I found the solution:

    get_output = theano.function([model.layers[0].input],model.layers[-1].output,allow_input_downcast=True)
    fx = theano.function( [model.layers[0].input] ,T.jacobian(model.layers[-1].output.flatten(),model.layers[0].input), allow_input_downcast=True)
    grad = fx([input_feature])

Saliency maps of neural networks (using Keras)

1 Answers1

Linked