Digging deeper into LSTM Keras

Question

I am running a regression problem on sensor data having four columns using the LSTM framework. I have not yet used any regularization.

The code I used specified below;

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from keras import callbacks
from keras.layers import Flatten

# load the dataset
gbx_data = pd.read_csv('/home/prm/Downloads/aggregated_vibration.csv', usecols=[4,5,6,7])
dataset = gbx_data.values
dataset = dataset.astype('float32')

scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)


train_size = int(len(dataset) * 0.63)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
print(len(train), len(test))

def create_dataset(dataset, look_back):
  dataX, dataY = [], []
  for i in range(len(dataset)-look_back-1):
    a = dataset[i:(i+look_back), :]
    dataX.append(a)
    dataY.append(dataset[i + look_back, :])
  return np.array(dataX), np.array(dataY)


look_back = 10
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)

trainX = trainX.reshape(trainX.shape[0], look_back, trainX.shape[2])    # model input shape & model output shape will be same always #
testX = testX.reshape(testX.shape[0], look_back, testX.shape[2])

batch_size = 120

class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        self.losses = []

    def on_epoch_end(self, epoch, logs={}):
        self.losses.append(logs.get('loss'))


model=Sequential()
model.add(LSTM(10, return_sequences=True, input_shape=(look_back, 4), activation='relu'))
model.add(Dropout(0.2))
model.add(LSTM(12, return_sequences=True, input_shape=(look_back, 4), activation='relu'))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(4, activation='relu'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history = LossHistory()
model.fit(trainX, trainY, epochs=10, batch_size=batch_size, callbacks=[history])
print(history.losses)

I will like to know the specifications for the following questions;

For each epoch end, i am getting the loss for the LossHistory class. How can I get the weights after each epoch? I know model.get_weights() gives me all the weights. But how can I get them after each epoch?
How can I optimally know which activation function I should use in LSTM & Dense layers so that my data performs 'best' & gives me good accuracy?
model.get_config() gives me 'stateful': False. If I execute a statefull LSTM, what change will actually occur & checking which values I can understand the change?
If return_sequences=False what change will occur?
How can I choose optimally the number of hidden nodes(neurons) for LSTM & Dense layer?

Running the above code, the loss history after 10 epochs are as follow,

[0.016399867401633194, 0.0029856997435597997, 0.0021351441705040426, 0.0016288172078515754, 0.0012535296516730061, 0.0010065438170736181, 0.00085688360991555948, 0.0007937529246583822, 0.00073356743746738303, 0.00069794598373472037]

with accuracy 77%.

I am adding the table of several iterative approaches as well.

Sorry If I asked a lot. Please share your assistance if possible.

Daniel Möller · Answer 1 · 2017-07-08T17:45:55.530

Weights after each epoch:

For doing things after each epoch, you can use a Callback, especially the LambdaCallback, which allows a very flexible usage.

Define a lambda callback that will get weights after each epoch:

getWeightsCallback = LambdaCallback(on_epoch_end=getWeightsFunction)

Where:

myWeights = []
def getWeightsFunction(epoch,logs):
    #adapt this code:
    myWeights.append(model.get_weights())

Then add the callback to your fit method:

model.fit(....., callbacks=[getWeightsCallback])

Activation functions

Unfortunately, I can't answer that, if there is an answer. But I do believe it's an intuitive thing, and it should be experimented until you find what is best for your specific task and model.

What I know, though, is about the last layer. This one is completely related to the final task.

A classification problem, with only one true class among many, benefits from using activation='softmax' and loss='categorical_crossentropy'
A classification problem with many true classes normally uses activation='sigmoid' and loss='binary_crossentropy'
Other problems should have better options too, depending on the application.

Stateful networks.

Recurrent networks have an "inner state", which is roughly the "memory" built from stepping through a sequence.

This state is unique for each sequence. Each sequence builds a state.

In non-stateful networks, the state is reset for every sequence
In stateful networks, the state is not automatically reset.

The idea of not resetting the states is to be able to divide each sequence in batches. If your sequences are too long (causing RAM or performance issues), you divide them in parts, and the model will understand that your batches are not "new sequences", but "sequels of the previous sequences".

The noticeable changes are the need of defining extra parameters such as batch size and passing the data correctly in sequence.

Keras documentation
An indirectly related question

Return sequences

Since recurrent networks work in time steps, every time step has a result.
You may choose to output all these results, ending up with a sequence (same number of time steps as the input). Or you may choose to get only the final result, discarding the time steps:

Return sequences=True: outputs (batch size, time steps, memory cells)
Return sequences=False: outputs (batch size, memory cells)

Same documentation as before

Ideal number of cells

Sorry, that is definitely an open question. It's totally dependent on what you want to do, the size of your data, the architecture of your model.

There is really no ready answer. Creating a perfect architecture for a certain application is exactly what everyone is seeking around the world.

You can make experiments or try to find papers that work with the same kind of thing you are working to see what are the best practices "so far".

Some related problems:

Overfitting: your model is too strong, or you have too little data. Your training accuracy gets great, but your validation accuracy doesn't evolve. Roughly, the model is memorizing the training data because it's just too good
Underfitting: your model is too weak, it simply can't solve the task

`myWeights.add(model.get_weights())` is not working for me though. Throwing me ; NameError: name 'myWeights' is not defined — Hindol Ganguly, Jul 08 '17 at 07:33
Adapt that code to your needs. Where do you want to store the weights? Print them? Save them? Put them in a var? That's where you choose what to do with `model.get_weights()` -- I edited my answer for a more concrete example. — Daniel Möller, Jul 08 '17 at 17:44