2

I have a binary classification problem where for each data point I have 3 time-series as follows.

data_point,   time_series1,      time_series2,      time_series3,  label
d1,         [0.1, ....., 0.5], [0.8, ....., 0.6], [0.8, ....., 0.8], 1
and so on

I am using the following code to perform my binary classification.

model = Sequential()
model.add(LSTM(100, input_shape=(25,4)))
model.add(Dense(50))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Since, currently I am considering my classification as a black-box task, I would like to dig deeper and see what happens inside.

More specifically, I would like to know the imporatant features used by LSTM to classify my datapoints. More importantly I want to answer the following questions;

  • Which time series (i.e. time_series1, time_series2, time_series3) was most influenced in the classification
  • What are features extracted from the most influenced timeseries?

I am happy to provide more details if needed.

OverLordGoldDragon
  • 1
  • 9
  • 53
  • 101
EmJ
  • 4,398
  • 9
  • 44
  • 105

1 Answers1

1

The Attention Mechanism is used for this exactly; programmatic implementation isn't simple, but use-ready repositories exist - see below. Example output below.

As to what attention 'is', see this SE answer, and/or this Quora answer; in a nutshell, it's a means of identifying the most 'important' timesteps, effectively mapping out a temporal 'heatmap'.

Lastly, as a tip, ditch LSTMs for IndRNNs; where former struggles w/ 800-1000 timesteps, latter's shown to succeed w/ 5000+. Features are also more interpretable, as each channel is independent, absent LSTM-type gating mechanisms. Though if speed is important, there is no CuDNNIndRNN.


OverLordGoldDragon
  • 1
  • 9
  • 53
  • 101