Forming a Multi input LSTM in Keras

Question

I am trying to predict neutron widths from resonance energies, using a Neural Network (I'm quite new to Keras/NNs in general so apologies in advance).

There is said to be a link between resonance energies and neutron widths, and the similiarities between energy increasing monotonically this can be modelled similiar to a time series problem.

In essences I have 2 columns of data with the first column being resonance energy and the other column containing the respective neutron width on each row. I have decided to use an LSTM layer to help in the networks predict by utlising previous computations.

From various tutorials and other answers, it seems common to use a "look_back" argument to allow the network to use previous timesteps to help predict the current timestep when creating the dataset e.g

trainX, trainY = create_dataset(train, look_back)

I would like to ask regarding forming the NN:

1) Given my particular application do I need to explicitly map each resonance energy to its corresponding neutron width on the same row?

2) Look_back indicates how many previous values the NN can use to help predict the current value, but how is it incorporated with the LSTM layer? I.e I dont quite understand how both can be used?

3) At which point do I inverse the MinMaxScaler?

That is the main two queries, for 1) I have assumed its okay not to, for 2) I believe it is possible but I dont really understand how. I can't quite work out what I have done wrong in the code, ideally I would like to plot the relative deviation of predicted to reference values in the train and test data once the code works. Any advice would be much appreciated:

import numpy
import matplotlib.pyplot as plt
import pandas
import math

from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error


# convert an array of values into a dataset matrix

def create_dataset(dataset, look_back=1):
    dataX, dataY = [], []
    for i in range(len(dataset) - look_back - 1):
        a = dataset[i:(i + look_back), 0]
        dataX.append(a)
        dataY.append(dataset[i + look_back, 1])
    return numpy.array(dataX), numpy.array(dataY)

# fix random seed for reproducibility
numpy.random.seed(7)      
# load the dataset
dataframe = pandas.read_csv('CSVDataFe56Energyneutron.csv', engine='python') 
dataset = dataframe.values
print("dataset")
print(dataset.shape)
print(dataset)

# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
print(dataset)
# split into train and test sets
train_size = int(len(dataset) * 0.67) 
test_size = len(dataset) - train_size
train, test = dataset[0:train_size, :], dataset[train_size:len(dataset), :]

# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)  
testX, testY = create_dataset(test, look_back)
# reshape input to be  [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))
testX = numpy.reshape(testX, (testX.shape[0],look_back, 1))
# # create and fit the LSTM network
# 
number_of_hidden_layers=16
model = Sequential()
model.add(LSTM(6, input_shape=(look_back,1)))
for x in range(0, number_of_hidden_layers):
    model.add(Dense(50, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
history= model.fit(trainX, trainY, nb_epoch=200, batch_size=32)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
print('Train Score: %.2f MSE (%.2f RMSE)' % (trainScore, math.sqrt(trainScore)))
testScore = model.evaluate(testX, testY, verbose=0)
print('Test Score: %.2f MSE (%.2f RMSE)' % (testScore, math.sqrt(testScore)))

score 0 · Answer 1 · answered Jul 31 '18 at 08:25

0

1) Given my particular application do I need to explicitly map each resonance energy to its corresponding neutron width on the same row?

Yes you have to do that. Basically your data has to be in a shape of. X=[timestep, timestep,...] y=[label, label,...]

2) Look_back indicates how many previous values the NN can use to help predict the current value, but how is it incorporated with the LSTM layer? I.e I dont quite understand how both can be used?

A LSTM is a sequence aware layer. You can think about it as a hidden markov model. It takes the first timestep, calculates something and in the next timestep the previous calculation is considered. Look_back, with is usually called sequence_length is just the maximum number of timesteps.

3) At which point do I inverse the MinMaxScaler?

Why should you do that? Furthermore, you don´t need to scale your input.

It seems like you have a general misconception in your model. If you have input_shape=(look_back,1) you don´t need LSTMs at all. If your sequence is just sequence of single values, it might be better to avoid LSTMs. Furthermore, fitting your model should include validation after each epoch to track the loss and validation performance.

model.fit(x_train, y_train,
      batch_size=32,
      epochs=200,
      validation_data=[x_test, y_test],
      verbose=1)

answered Jul 31 '18 at 08:25

ixeption

1,972
1
13
19

thanks 1) you mention the data being X=[enery[0], energy[1],...] y=[width[0], width[1],...], do you mean explicit mapping such as using a dictionary? Or is that not needed given each row already the corresponding width 2) so sequence_length also indicate the max num of timestep to consider in a calculation, but affects the shape of X and Y? 3) I scaled my datset as I thought larger values could disproportionately affect the weights in my network, given my values are over a large range. 4) What difference does using [x_test, y_test] for validation as opposed to Model.predict provide? – Askquestions Jul 31 '18 at 12:19
1) you don´t need a dict its just energy[0] is labeld with width[0] and so on. 2) yes you are required to pad your data to the seq_length with the pad_sequences function in keras 3) Scaling is ok. It does execute the validation after very epoch, so you can track your model performance and overfittign will be visible. – ixeption Jul 31 '18 at 12:51
If I were to scale, at which point should I inverse transform the data? I have noticed scaling reduces the RMSE for the train test score to 0/0.01, and the prediction almost mirror the reference values (which seems almost too good). Ah I see, but then using validation_data=[x_test, y_test], essentially leaves no data for Model.Predict..Or if i understood correctly if I use a validation set I dont need to used Model.Predict at all right? – Askquestions Jul 31 '18 at 15:04
I don´t know exactly, why you want to inverse transfrom the data. Regarding the validation, yes right. – ixeption Jul 31 '18 at 15:06
The main objective is to plot the relative deviation of the predicted network values against the reference values (essentially plot trainPredict = model.predict(trainX) and testPredict = model.predict(testX)). So i guess for that application you perhaps dont need to inverse transform the data? but if i wanted to use the predicted values I would need inverse transform to get values in a normal scale( basically get rid off the MinMaxScaler effect) – Askquestions Jul 31 '18 at 18:04
Another problem with MinMaxScaler is that obtaining an output of 'Test Score: 0.00 MSE (0.01 RMSE)" which is confusing as ofcourse the loss values are of the order 10^-6, so would such low values indicate overfitting or just that is expected given the magnitude of the scaled values? – Askquestions Jul 31 '18 at 18:36
Your prediction is a regression problem right? If your target prediction is in range of 10^-6, you are right in scaling these values up. I am not sure how your data really looks like. – ixeption Jul 31 '18 at 19:08
Yes its a regression problem but quite a complex relationship between the two variables. ah my data is actually provided in a pastbin link in the [post](https://pastebin.com/index/9qwJU3AQ) I'm trying to predict 30% of the second column. The actual values (not scaled) aren't in the 10^-6 region but isnt the Net going to try predict scaled values right? – Askquestions Jul 31 '18 at 19:33
But you have more data than this right? LSTMs need a lot of data. I am talking about millions of such pairs. – ixeption Jul 31 '18 at 20:36
Millions? oh no Essentially the data I have is for one element - each element in the periodic table would have their own characteristic energies/neutron widths. In theory i could train on all of them but I feel each pattern of energy/neutron widths would be unique to each element so not sure how it would help. I also wonder is having such low RMSE scores indicative of not enough training data? – Askquestions Jul 31 '18 at 23:29
I guess so, yes – ixeption Aug 01 '18 at 06:41

Forming a Multi input LSTM in Keras

1 Answers1