I am creating a Hidden Markov Model and an LSTM neural network to make predictions on the same dataset to compare the performance of the two different models. I have my HMM working well, but when trying to train my LSTM with the same dataset I'm having trouble getting my network to learn anything at all. For reference, here a generalized diagram that describes what I'm attempting to accomplish:
In order to implement a LSTM neural network, I followed this article which uses a small Keras model to make predictions on a dataset with multiple inputs like my problem. However, after implementing a model very similar to what the tutorial laid out (code below), my accuracy never goes above 40%. In fact, the accuracy is always the exact same from epoch 1 all the way to whatever epoch I choose to end my training on. For some reason, my loss is super low no matter what as well, which makes me think the accuracy should be higher. Because the loss and the accuracy don't line up, I'm suspecting that I'm representing my data completely incorrectly for my model or parameter I have in my model is totally wrong.
My dataset is very basic so I feel like I'm missing something large. I've created a CCN before rather easily and I thought making an LSTM would be easy as well as long as I followed a tutorial. If I want to create a very basic LSTM to make very basic predictions, what sort of model should I create? What loss function should I use when working with categorical classifications and an LSTM? Last specific question I can think of, what generally causes the accuracy to never improve and always be the same?
What I have so far for the implementation of the LSTM:
# Number of games to go back for next prediction.
TIME_STEPS = 1
# Gets the game data from the generated CSV file.
# Column 1 - Game Number
# Column 2 - Result
# Column 3 - My Rating
# Column 4 - Opponent's Rating
dataFile = 'ChessData.csv'
data = pd.read_csv(dataFile, index_col='Game Number')
df = data.copy()
# Splits the CSV file into training and validation data.
train_size = int(len(df) * 0.8)
train_dataset, test_dataset = df.iloc[:train_size], df.iloc[train_size:]
# Splits the data based on target/dependent variables.
# Also creates the X and y for supervised learning.
X_train = train_dataset.drop('Result', axis=1)
y_train = train_dataset.loc[:, ['Result']]
# Splits the test data for X and y and well.
X_test = test_dataset.drop('Result', axis=1)
y_test = test_dataset.loc[:, ['Result']]
# Different scaler for input and output
scaler_x = MinMaxScaler(feature_range = (0,1))
scaler_y = MinMaxScaler(feature_range = (0,1))
# Fit the scaler using available training data
input_scaler = scaler_x.fit(X_train)
output_scaler = scaler_y.fit(y_train)
# Apply the scaler to training data
y_train = output_scaler.transform(y_train)
X_train = input_scaler.transform(X_train)
# Apply the scaler to test data
y_test = output_scaler.transform(y_test)
X_test = input_scaler.transform(X_test)
# Create a 3D input
def create_dataset (X, y, time_steps = 1):
Xs, ys = [], []
for i in range(len(X)-time_steps):
v = X[i:i+time_steps, :]
Xs.append(v)
ys.append(y[i+time_steps])
return np.array(Xs), np.array(ys)
# Creates the 3D input by calling create_dataset for both
# the training data and the testing data.
X_test, y_test = create_dataset(X_test, y_test, TIME_STEPS)
X_train, y_train = create_dataset(X_train, y_train, TIME_STEPS)
# Defines the LSTM Model
def create_model(units, m):
model = Sequential()
model.add(m (units = units, return_sequences = True,
input_shape = [X_train.shape[1], X_train.shape[2]]))
model.add(Dropout(0.2))
model.add(m (units = units))
model.add(Dropout(0.2))
model.add(Dense(units = 1))
#Compile model
model.compile(optimizer=keras.optimizers.Adam(0.001),
loss="categorical_crossentropy",
metrics=["accuracy"])
return model
# Creates an LSTM model instance
model_lstm = create_model(128, LSTM)
# Fits the LSTM Model
def fit_model(model):
early_stop = keras.callbacks.EarlyStopping(monitor = 'val_loss',
patience = 10)
history = model.fit(X_train, y_train, epochs = 100,
validation_split = 0.2, batch_size = 32,
shuffle = False, callbacks = [early_stop])
return history
history_lstm = fit_model(model_lstm)
# Make prediction
def prediction(model):
prediction = model.predict(X_test)
prediction = scaler_y.inverse_transform(prediction)
return prediction
prediction_lstm = prediction(model_lstm)
print(prediction_lstm)