I have trying to setup a non-linear regression problem in Keras. Unfortunately, results show that overfitting is occurring. Here is the code,
model = Sequential()
model.add(Dense(number_of_neurons, input_dim=X_train.shape[1], activation='relu', kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation = 'relu', kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation='relu', kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0)))
model.add(Dense(outdim, activation='linear'))
Adam = optimizers.Adam(lr=0.001)
model.compile(loss='mean_squared_error', optimizer=Adam, metrics=['mae'])
model.fit(X, Y, epochs=1000, batch_size=500, validation_split=0.2, shuffle=True, verbose=2 , initial_epoch=0)
The results without regularization is shown here Without regularization. The mean absolute error for training is much less compared to validation, and both have a fixed gap which is a sign of over-fitting.
L2 regularization was specified for each layer like so,
model = Sequential()
model.add(Dense(number_of_neurons, input_dim=X_train.shape[1], activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation = 'relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(outdim, activation='linear'))
Adam = optimizers.Adam(lr=0.001)
model.compile(loss='mean_squared_error', optimizer=Adam, metrics=['mae'])
model.fit(X, Y, epochs=1000, batch_size=500, validation_split=0.2, shuffle=True, verbose=2 , initial_epoch=0)
The results for these are shown here L2 regularized result. The MAE for test is close to training which is good. However, the MAE for training is poor at 0.03 (without regularization it was much lower at 0.0028).
What can i do to reduce the training MAE with regularization?