I'm trying to do text prediction using recurrent neural networks (LSTM) with dataset from books. It doesn't matter how much I try changing layers size or other parameters, it always overfits.
I've been trying changing amount of layers, amount of units in LSTM layer, regularization, normalization, batch_size, shuffle training data/validation data, change dataset to bigger. For now I try with ~140kb txt book. I have also tried 200kb, 1mb, 5mb.
Creating training/validation data:
sequence_length = 30
x_data = []
y_data = []
for i in range(0, len(text) - sequence_length, 1):
x_sequence = text[i:i + sequence_length]
y_label = text[i + sequence_length]
x_data.append([char2idx[char] for char in x_sequence])
y_data.append(char2idx[y_label])
X = np.reshape(x_data, (data_length, sequence_length, 1))
X = X/float(vocab_length)
y = np_utils.to_categorical(y_data)
# Split into training and testing set, shuffle data
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, shuffle=False)
# Shuffle testing set
X_test, y_test = shuffle(X_test, y_test, random_state=0)
Creating model:
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True, recurrent_initializer='glorot_uniform', recurrent_dropout=0.3))
model.add(LSTM(256, return_sequences=True, recurrent_initializer='glorot_uniform', recurrent_dropout=0.3))
model.add(LSTM(256, recurrent_initializer='glorot_uniform', recurrent_dropout=0.3))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
I get following characteristics:
I don't know what to do about this overfitting, because I am searching internet, trying many things but none of them seems to work.
How could I get better results? These prediction doesn't seem to be good right now.