how to plot correctly loss curves for training and validation sets?

Question

I want to plot loss curves for my training and validation sets the same way as Keras does, but using Scikit. I have chosen the concrete dataset which is a Regression problem, the dataset is available at:

http://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/

So, I have converted the data to CSV and the first version of my program is the following:

Model 1

df=pd.read_csv("Concrete_Data.csv")
train,validate,test=np.split(df.sample(frac=1),[int(.8*len(df)),int(.90*len(df))])
Xtrain=train.drop(["ConcreteCompStrength"],axis="columns")
ytrain=train["ConcreteCompStrength"]
Xval=validate.drop(["ConcreteCompStrength"],axis="columns")
yval=validate["ConcreteCompStrength"]
mlp=MLPRegressor(activation="relu",max_iter=5000,solver="adam",random_state=2)
mlp.fit(Xtrain,ytrain)

plt.plot(mlp.loss_curve_,label="train")
mlp.fit(Xval,yval)                           #doubt
plt.plot(mlp.loss_curve_,label="validation") #doubt
plt.legend()

The resulting graph is the following:

In this model, I doubt if it's the correct marked part because as long as I know one should leave apart the validation or testing set, so maybe the fit function is not correct there. The score that I got is 0.95.

Model 2

In this model I try to use the validation score as follows:

df=pd.read_csv("Concrete_Data.csv")
train,validate,test=np.split(df.sample(frac=1),[int(.8*len(df)),int(.90*len(df))])
Xtrain=train.drop(["ConcreteCompStrength"],axis="columns")
ytrain=train["ConcreteCompStrength"]
Xval=validate.drop(["ConcreteCompStrength"],axis="columns")
yval=validate["ConcreteCompStrength"]
mlp=MLPRegressor(activation="relu",max_iter=5000,solver="adam",random_state=2,early_stopping=True)
mlp.fit(Xtrain,ytrain)

plt.plot(mlp.loss_curve_,label="train")
plt.plot(mlp.validation_scores_,label="validation")   #line changed
plt.legend()

And for this model, I had to add the part of early stopping set to true and validation_scores_to be plotted, but the graph results are a little bit weird:

The score I get is 0.82, but I read that this occurs when the model finds it easier to predict the data in the validation set that in the train set. I believe that is because I am using the validation_scores_ part, but I was not able to find any online reference about this particular instruction.

How it will be the correct way to plot these loss curves for adjusting my hyperparameters in Scikit?

Update I have programmed the module as it was advise like this:

mlp=MLPRegressor(activation="relu",max_iter=1,solver="adam",random_state=2,early_stopping=True)
training_mse = []
validation_mse = []
epochs = 5000
for epoch in range(1,epochs):
    mlp.fit(X_train, Y_train) 
    Y_pred = mlp.predict(X_train)
    curr_train_score = mean_squared_error(Y_train, Y_pred) # training performances
    Y_pred = mlp.predict(X_valid) 
    curr_valid_score = mean_squared_error(Y_valid, Y_pred) # validation performances
    training_mse.append(curr_train_score) # list of training perf to plot
    validation_mse.append(curr_valid_score) # list of valid perf to plot
plt.plot(training_mse,label="train")
plt.plot(validation_mse,label="validation")
plt.legend()

but the plot obtained are two flat lines:

It seems I am missing something here.

score 3 · Accepted Answer · edited Mar 01 '22 at 11:35

3

You shouldn't fit your model on the validation set. The validation set is usually used to decide what hyperparameters to use, not the parameters' values.

The standard way to do training is to divide your dataset into three parts

training
validation
test

For example with a split of 80, 10, 10 %

Usually, you would select a neural network (how many layers, nodes, what activation functions) and then train -only- on the training set, check the result on the validation, and then on the test

I'll show a pseudo algorithm to make it clear:

for model in my_networks:       # hyperparameters selection
    model.fit(X_train, Y_train) # parameters fitting
    model.predict(X_valid)      # no train, only check on performances

Save model performances on validation and pick the best model (the one with the best scores on the validation set) then check results on the testset:

model.predict(X_test) # this will be the estimated performance of your model

If your dataset is big enough, you could also use something like cross-validation.

Anyway, remember:

the parameters are the network weights
you fit the parameters with the training set
the hyperparameters are the ones that define the net architecture (layers, nodes, activation functions)
you select the best hyperparameters checking the result of your model on the validation set
after this selection (best parameters, best hyperparameters) you get the model performances testing the model on the test set

To obtain the same result of keras, you should understand that when you call the method fit() on the model with default arguments, the training will stop after a fixed amount of epochs (200), with your defined number of epochs (5000 in your case) or when you define a early_stopping.

max_iter: int, default=200

Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

Check your model definition and arguments on the scikit page

To obtain the same result of keras, you could fix the training epochs (eg. 1 step per training), check the result on validation, and then train again until you reach the desired number of epochs

for example, something like this (if your model uses mse):

epochs = 5000

mlp = MLPRegressor(activation="relu",
                   max_iter=1,
                   solver="adam",
                   random_state=2,
                   early_stopping=True)
training_mse = []
validation_mse = []
for epoch in epochs:
    mlp.fit(X_train, Y_train) 
    Y_pred = mlp.predict(X_train)
    curr_train_score = mean_squared_error(Y_train, Y_pred) # training performances
    Y_pred = mlp.predict(X_valid) 
    curr_valid_score = mean_squared_error(Y_valid, Y_pred) # validation performances
    training_mse.append(curr_train_score)                  # list of training perf to plot
    validation_mse.append(curr_valid_score)                # list of valid perf to plot

edited Mar 01 '22 at 11:35

Mario

1,631
2
21
51

answered Oct 24 '20 at 18:43

Nikaido

4,443
5
30
47

thanks @Nikaido I get it, but how can I plot the validation curves in this model? I would not like to use CV for this purposes – Little Oct 24 '20 at 18:49
@Little I have done an update. It is not accurate, maybe there is something faster, but it's to give you the idea – Nikaido Oct 24 '20 at 19:09
thank you @Nikaido, but I believe the mlp instruction should be inside the for loop, am I right? – Little Oct 24 '20 at 19:24
@Little, no, if you define the model every time in the loop it will initialize the network again. You want a progressive learning of your model, with an improvement at each epoch – Nikaido Oct 24 '20 at 19:47
@Little keep in mind that with my example I am testing only a network and its progressive training. If I want to test other networks, I need another loop outside, that iterate (and initialize) a new network with a new structure (hyperparameters) every time – Nikaido Oct 24 '20 at 19:50
Basically I am doing validation only with fixed hyperparameter. In this case the validation set is equivalent to the test set – Nikaido Oct 24 '20 at 19:56
@Little to make it more clear, in the last example, it's like "opening" the fit method (model.fit), the which is doing an iteration under the hood. – Nikaido Oct 24 '20 at 19:58
@Little check also the answers to this question: https://stackoverflow.com/questions/46912557/is-it-possible-to-get-test-scores-for-each-iteration-of-mlpclassifier – Nikaido Oct 24 '20 at 20:42
thanks @Nikaido, I have tried to program it and got only two flat lines, could you be so kind to check it up what am I missing? – Little Oct 24 '20 at 21:17
@Little sorry, probably in the for loop you should use "partial_fit", not "fit" – Nikaido Oct 24 '20 at 21:26
@Little https://datascience.stackexchange.com/questions/68599/incremental-learning-with-sklearn-warm-start-partial-fit-fit – Nikaido Oct 24 '20 at 21:30
@Little unfortunately it's not strightforward doing an incremental learning in scikitlearn – Nikaido Oct 24 '20 at 21:30
1

now it works with the partial fit, but it is more visible for a small number of epochs like 500. Thank you for your incredible patience :), one last question, would you recommend for this situations to stick up with Keras? It seems is easier than scikit to plot this graphs. Any advice? – Little Oct 24 '20 at 21:32
1

@Little, It dependes. If you need to do deep learning it's better keras. If you work with small data, scikit learn is better I think. Keep in mind that keras works on neural networks. Scikit learn is not focused, and covers a lot of different models. It depends on your objective. You are welcome :). Don't forget to accept the best answer to your question! – Nikaido Oct 24 '20 at 21:36

Anelise Dick · Answer 2 · 2023-02-13T13:00:07.460

I have the same problem: obtained two flat lines when using the module as it was advised, I solve the problem just adding warm_start=True to the MLPRegressor parameters, as explained in MLPRegressor- 1.17.9. More control with warm_start

mlp=MLPRegressor(activation="relu",max_iter=1,solver="adam",random_state=2,early_stopping=True, warm_star=True)

The plot obtained are now correct: Train and validation loss curves

how to plot correctly loss curves for training and validation sets?

2 Answers2

Linked