I've found a good work about this:
"How to measure importance of inputs" by Warren S. Sarle, SAS Institute Inc., Cary, NC, USA
ftp://ftp.sas.com/pub/neural/importance.html
Briefly:
- Summing weights does not work.
- Summing normalized weights does not work.
- Summing gradients does not work well.
- Removing (zeroing or setting to mean) inputs one after another [and re-training] - works but takes a lot of time.
- Summing small differences of output with respect to inputs - works pretty well!
Now shortly about the last method, which I prefer to use:
For output function Y = f( X1, X2, X3), you could compute:
D1 = f( X1+h, X2, X3) - f( X1, X2, X3)
D2 = f( X1, X2+h, X3) - f( X1, X2, X3)
D3 = f( X1, X2, X3+h) - f( X1, X2, X3)
An average of these absolute differences over all pairs of input values gives a good estimation of each input's importance.
This is how I do it in Lua Torch
Note 1: I take squared difference instead of absolute values.
Note 2: My inputs matrix is normalized, that is why I can choose
values of h as [-1..1].
local samples_count = inputs:size(1)
local inputs_count = inputs:size(2)
local outputs = model:forward(inputs):clone()
local importance = torch.zeros(inputs_count)
print("Processing inputs 1 to "..tostring(inputs_count)); io.flush()
for i = 1, inputs_count do
io.write("\rProcessing "..tostring(i)); io.flush()
for h = -1, 1, 0.2 do
local inputs_h = inputs:clone()
if h ~= 0 then inputs_h[{{},{i,i}}]:add(h) end
local outputs_h = model:forward(inputs_h)
importance[i] = importance[i] + torch.add(outputs_h, -1, outputs):pow(2):sum()
end -- for h
end -- for inputs_count
importance:div(samples_count)
print("\nimportance:\n", importance)