Neural network does not seems to be learning anything from data

Question

TLDR: Output results show no prediction potential.

I want to predict the next element of a sequence of categorical data. In this context, the neural network should determine what is the most likely next element in a sequence based on the sequence itself.

I have 3 classes, therefore a completely random prediction would be 1/3 chance of success. The results I got until now are close to 1/3, so the neural network process did not improve random choices. I tried using a fixed pattern that repeats, which should be obvious to the process, but it does not seems to be able to track patterns at all. I mean, I am new to the neural networks methods for prediction, so I'm not sure if fixed patterns are obvious for neural networks.

Regarding the inputs, what I have tried: (1) x inputs are the indexes of this sequence (let's say this sequence has N elements) and y inputs are the elements (N samples); (2) x input is a vector with the sequence elements, except the last term and y input is a vector with the sequence element, except the first element (1 sample); and (3) x inputs are vectors with increasing elements from the sequence, i.e., x1 carries only the first element, x2 carries the two initial elements and so on. Regarding y inputs, it is the consecutive element of each x input, resulting in a vector with all the sequence elements, except the first.

The code using inputs (3) is shown below: UPDATED

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.sequence import pad_sequences

data = ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"]
prev_history = []
correct_guess = 0

for i in range(1,100):
    input_size = len(data)
    vocab = ["A", "B", "C"]
    layer = keras.layers.StringLookup(vocabulary=vocab)

    x_train = np.array(layer(data))
    x_train = x_train - 1

    x_sequences = []
    y_sequences = []
    for i in range(1, input_size):
        x_sequences.append(x_train[:i])
        y_sequences.append(x_train[i])

    x_sequences[-1] = np.concatenate((x_sequences[-1], [-1]), axis=0)
    x_train = keras.utils.pad_sequences(x_sequences, padding='post', value=-1)
    y_train = np.array(y_sequences).transpose()

    model = keras.Sequential([
        keras.layers.Dense(8, activation='relu', input_dim=input_size),
        keras.layers.Dense(8, activation='relu'),
        keras.layers.Dense(3, activation='softmax')
    ])

    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.fit(x_train, y_train, epochs=1, batch_size=1, shuffle=False)
    
    if prev_history != []:
        if data[-1] == prev_history[-1]:
            correct_guess += 1
        if len(prev_history) > 0:
            correct_guess_rate = correct_guess/(len(prev_history))

    x_pred = np.array(layer(data))
    x_pred = x_pred - 1
    x_pred = x_pred.reshape(1, input_size)
    predchance = model.predict(x_pred)
    max_n = np.argmax(predchance)
    pred = layer.get_vocabulary()[max_n+1]
    prev_history.append(pred)

    choices = ["A", "B", "C"]
    data.append(choices[input_size % len(choices)])

I was expecting to get results better at least than 1/3 of successfully predictions. I tried increasing the sequence, increasing layers, increasing layer units, increasing epochs, changing model... I also must confess I am not sure about the input configuration and batch_size, as I had a hard time setting the code to work without errors. Well, I have tried to find similar problems with solution and learn more to find what I am doing wrong, but I considered to post here.

Any tips or point to possible errors?

Update

I appreciate replies from community members. I have learned a lot from this simple project. The previous sample sequences weren't adequate for the objective, therefore I used sequences of combinations of three elements as data input with their consecutive element as targets. With this modification, the resulting code is:

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

data = ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"]
prev_history = []
correct_guess = 0

for i in range(1,100):
    input_size = len(data)
    vocab = ["A", "B", "C"]
    layer = keras.layers.StringLookup(vocabulary=vocab)
    data_u = layer(data)
    data_u = data_u - 1
    seq_length = 3

    dataset = keras.utils.timeseries_dataset_from_array(
        data_u.numpy(),
        targets = data_u[seq_length:],
        sequence_length = seq_length,
        batch_size = 2
    )

    model = keras.Sequential([
        keras.layers.SimpleRNN(32, activation='relu', input_shape=[None,1]),
        keras.layers.Dense(3, activation='softmax')
    ])

    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.fit(dataset, epochs=10, shuffle=True)
    
    if prev_history != []:
        if data[-1] == prev_history[-1]:
            correct_guess += 1
        if len(prev_history) > 0:
            correct_guess_rate = correct_guess/(len(prev_history))

    x_pred = data_u[-seq_length:]
    x_pred = tf.reshape(x_pred, [1,seq_length])
    predchance = model.predict(x_pred)
    max_n = np.argmax(predchance)
    pred = layer.get_vocabulary()[max_n+1]
    prev_history.append(pred)

    choices = ["A", "B", "C"]
    data.append(choices[input_size % len(choices)])

With this code, the predictions were very accurate. I tried different patterns, and the predictions are consistently superior to random choices.

The data is only the repeating ABC vector I wrote in the code. I did not tried with another set of data that could involve randomness. I'm not sure if I understood your question. — Segala, Jun 07 '23 at 22:38
You don't need three layers in the model because the data is very simple and you don't have much. I tried your code using triplets in `x_sequences` instead of the full past + padding, then it works fine. — elbe, Jun 08 '23 at 12:07
@GoldenLion This is the code. I removed the function call and unnecessary lines (commentaries and call iteration to make more predictions and evaluate the % of correct guess) to sumarize and make it clean reading. To add context to my purpose: I am trying to learn further how neural networks prediction works in practice by using this code. I am just starting with this simple sequence code. — Segala, Jun 08 '23 at 16:58
I am getting the error running your code: Detected at node 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits' defined at (most recent call last): It means the input is incorrect — Golden Lion, Jun 08 '23 at 17:33
just a simple remark if the three last letters in x are ABC then the y letter is A, with BCA it is B, with CAB it is C... — elbe, Jun 08 '23 at 18:47
@GoldenLion You are right, I ended up erasing one line. The x_train vector must be subtracted by 1 in order to fit into the 3 classes. I will update the code. — Segala, Jun 08 '23 at 19:04
@elbe I will try to use separated sequences with lower length. To be honest, I wasn't expecting it to be necessary for such simple sequence. I was looking to use tf.keras.utils.timeseries_dataset_from_array command for this purpose, however I am not sure about if it is suitable for ordering, as I don't have time value associated. — Segala, Jun 08 '23 at 19:11

Golden Lion · Answer 1 · 2023-06-12T13:25:36.323

I approached the problem from a categorical process, where the input should match the output for one of three classes. What is your goal?

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.preprocessing import LabelEncoder
from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Activation
import matplotlib.pyplot as plt


def plotHistory(history):
        plt.plot(history.history['accuracy'])
        plt.title('accuracy')
        plt.ylabel('accuracy')
        plt.xlabel('epoch')
        plt.legend(['train', 'test'], loc='best')
        plt.show()

data = ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"]
categories = ["A", "B", "C"]
target = ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"]

#lstm_train(data, target)    

df=pd.DataFrame(columns=categories)
for index,item in enumerate(data):
    for category in categories:
        if category==item:
            df.loc[index,item]=1
        else:
            df.loc[index,category]=0
#print(df)
for item in categories:
    df[item]=df[item].astype('category')

encoder=LabelEncoder()

encoded_target=encoder.fit_transform(target)
sparse_cat=to_categorical(encoded_target,num_classes=len(categories))
sparse_cat=np.asarray(sparse_cat).astype('float32')
sparse_cat = np.argmax(sparse_cat, axis=-1)

#print("encoded",encoded_target)
#print("sparse category",sparse_cat)

model = Sequential()

input_length=len(df)
model.add(Dense(512,input_shape=(None, 3)))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
#model.add(LSTM(1,input_shape=( input_length, 3)))
#model.add(Flatten())
model.add(Dense(3, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])
#model.compile(optimizer="rmsprop", loss='categorical_crossentropy', metrics=['accuracy'])

history=model.fit(np.asarray(df).astype(int), sparse_cat, epochs=10, batch_size=100, verbose=0)

plotHistory(history)
#.apply(pd.Series.astype, dtype='category')

isn't your input data the same as your output data (just formatted differently)? also, with no sequencing of input data, aren't you only passing one datapoint into the model? why the need for such huge layers? what even is the model supposed to learn from this... repeating your input? — Quantum, Jun 11 '23 at 19:54
I agree: it's a simple classification problem. IMO you would need (at least) the three last characters as input. Your code does well with the formatting, I'd suggest implementing the sequence of the last characters as input and also shrinking the model, maybe to a 4x4 model. — Quantum, Jun 12 '23 at 12:07
https://stackoverflow.com/questions/49161174/tensorflow-logits-and-labels-must-have-the-same-first-dimension this article shows the sequencing of the labels output. I think it may be more what your attempting to solve — Golden Lion, Jun 12 '23 at 13:18
added sparse_cat = np.argmax(sparse_cat, axis=-1) to create the single array of categories and switched to sparse_cat = np.argmax(sparse_cat, axis=-1) — Golden Lion, Jun 12 '23 at 13:26
I see you posted another code and I appreciate. It may be trivial, but how can you predict the next element of the input sequence with this trained model? I'm sorry if it is too simple, but I just started studying neural networks. — Segala, Jun 12 '23 at 14:48
I have to build an lstm encoder and decoder to predict the next sequence in the input chain. Perhaps you can create a new question about training an encoder that can predict the next character in the string. — Golden Lion, Jun 12 '23 at 19:30

Neural network does not seems to be learning anything from data

Update

1 Answers1