I'm having a problem with a model I want to train.
It's a typical seq-to-seq problem with an attention layer, where the input is a string, and the output is a substring from the submitted string.
e.g.
Input Ground Truth
-----------------------------
helloimchuck chuck
johnismyname john
(This is just a dummy data, not a real part of the dataset ^^)
And the model looks like this:
model = Sequential()
model.add(Bidirectional(GRU(hidden_size, return_sequences=True), merge_mode='concat',
input_shape=(None, input_size))) # Encoder
model.add(Attention())
model.add(RepeatVector(max_out_seq_len))
model.add(GRU(hidden_size * 2, return_sequences=True)) # Decoder
model.add(TimeDistributed(Dense(units=output_size, activation="softmax")))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=['accuracy'])
The problem is this here:
As you can see, there is overfitting.
I'm using early stop criteria on the validation loss with patience=8
.
self.Early_stop_criteria = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0,
patience=8, verbose=0,
mode='auto')
And I'm using one-hot-vector to fit the model.
BATCH_SIZE = 64
HIDDEN_DIM = 128
The thing is, I've tried with other batch sizes, other hidden dimensions, a dataset of 10K rows, 15K rows, 25K rows and now 50K rows. However, there is always overfitting, and I don't know why.
The test_size = 0.2
and the validation_split=0.2
. Those are the only parameters I haven't changed.
I'm also made me sure that the dataset properly build.
The only idea that I have is trying with another validation split, maybe 0.33
instead of 0.2
.
I don't know if cross-validation
would help.
Maybe anyone has a better idea, what I could try. Thanks in advance.