0

I am creating a Hidden Markov Model and an LSTM neural network to make predictions on the same dataset to compare the performance of the two different models. I have my HMM working well, but when trying to train my LSTM with the same dataset I'm having trouble getting my network to learn anything at all. For reference, here a generalized diagram that describes what I'm attempting to accomplish:

LSTM Representational Diagram

In order to implement a LSTM neural network, I followed this article which uses a small Keras model to make predictions on a dataset with multiple inputs like my problem. However, after implementing a model very similar to what the tutorial laid out (code below), my accuracy never goes above 40%. In fact, the accuracy is always the exact same from epoch 1 all the way to whatever epoch I choose to end my training on. For some reason, my loss is super low no matter what as well, which makes me think the accuracy should be higher. Because the loss and the accuracy don't line up, I'm suspecting that I'm representing my data completely incorrectly for my model or parameter I have in my model is totally wrong.

My dataset is very basic so I feel like I'm missing something large. I've created a CCN before rather easily and I thought making an LSTM would be easy as well as long as I followed a tutorial. If I want to create a very basic LSTM to make very basic predictions, what sort of model should I create? What loss function should I use when working with categorical classifications and an LSTM? Last specific question I can think of, what generally causes the accuracy to never improve and always be the same?

What I have so far for the implementation of the LSTM:

# Number of games to go back for next prediction.
TIME_STEPS = 1

# Gets the game data from the generated CSV file.
# Column 1 - Game Number
# Column 2 - Result
# Column 3 - My Rating
# Column 4 - Opponent's Rating
dataFile = 'ChessData.csv'
data = pd.read_csv(dataFile, index_col='Game Number')
df = data.copy()

# Splits the CSV file into training and validation data.
train_size = int(len(df) * 0.8)
train_dataset, test_dataset = df.iloc[:train_size], df.iloc[train_size:]

# Splits the data based on target/dependent variables.
# Also creates the X and y for supervised learning.
X_train = train_dataset.drop('Result', axis=1)
y_train = train_dataset.loc[:, ['Result']]

# Splits the test data for X and y and well.
X_test = test_dataset.drop('Result', axis=1)
y_test = test_dataset.loc[:, ['Result']]

# Different scaler for input and output
scaler_x = MinMaxScaler(feature_range = (0,1))
scaler_y = MinMaxScaler(feature_range = (0,1))
# Fit the scaler using available training data
input_scaler = scaler_x.fit(X_train)
output_scaler = scaler_y.fit(y_train)
# Apply the scaler to training data
y_train = output_scaler.transform(y_train)
X_train = input_scaler.transform(X_train)
# Apply the scaler to test data
y_test = output_scaler.transform(y_test)
X_test = input_scaler.transform(X_test)

# Create a 3D input
def create_dataset (X, y, time_steps = 1):
    Xs, ys = [], []
    for i in range(len(X)-time_steps):
        v = X[i:i+time_steps, :]
        Xs.append(v)
        ys.append(y[i+time_steps])
    return np.array(Xs), np.array(ys)

# Creates the 3D input by calling create_dataset for both
# the training data and the testing data.
X_test, y_test = create_dataset(X_test, y_test, TIME_STEPS)
X_train, y_train = create_dataset(X_train, y_train, TIME_STEPS)


# Defines the LSTM Model
def create_model(units, m):
    model = Sequential()
    model.add(m (units = units, return_sequences = True,
                input_shape = [X_train.shape[1], X_train.shape[2]]))
    model.add(Dropout(0.2))
    model.add(m (units = units))
    model.add(Dropout(0.2))
    model.add(Dense(units = 1))
    #Compile model
    model.compile(optimizer=keras.optimizers.Adam(0.001),
    loss="categorical_crossentropy",
    metrics=["accuracy"])
    return model

# Creates an LSTM model instance
model_lstm = create_model(128, LSTM)

# Fits the LSTM Model
def fit_model(model):
    early_stop = keras.callbacks.EarlyStopping(monitor = 'val_loss',
                                              patience = 10)
    history = model.fit(X_train, y_train, epochs = 100, 
                      validation_split = 0.2, batch_size = 32,
                        shuffle = False, callbacks = [early_stop])
    return history

history_lstm = fit_model(model_lstm)

# Make prediction
def prediction(model):
    prediction = model.predict(X_test)
    prediction = scaler_y.inverse_transform(prediction)
    return prediction

prediction_lstm = prediction(model_lstm)
print(prediction_lstm)
desertnaut
  • 57,590
  • 26
  • 140
  • 166
coleam
  • 1

2 Answers2

0

I see a problem in your network : You're using categorical cross entropy with an output of size 1. I don't know what you're predicting, but if this a binary classification (0 or 1 for instance), you should use binary_crossentropy. If it's a multi classification problem, you should use categorical cross entropy, set the last layer size to the number of classes to predict and your labels one-hot encoded.

One-hot encoding, with an example of 4 classes, means the label is equal to an array of length 4 full of zero except hat for the corresponding label you set 1:

y1 = [0,0,1,0] #means third class 
y2 = [0,1,0,0] #means second class
B Douchet
  • 970
  • 1
  • 9
  • 20
  • Thanks for the response! That's definitely helpful because my problem is a multi classification problem which is why I am using categorical cross entropy. How can I change my Ys so that each y is an array to represent the one-hot encoding? – coleam Dec 01 '20 at 16:18
  • You have various option, you can do it yourself, use tf.one_hot https://www.tensorflow.org/api_docs/python/tf/one_hot or use this https://stackoverflow.com/questions/29831489/convert-array-of-indices-to-1-hot-encoded-numpy-array. – B Douchet Dec 01 '20 at 17:05
0
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder()

labels = ohe.fit_transform(y)
lables = np.array(labels)
R S
  • 21
  • 2