would we ever compute the cost J(θ) on the test set?

Question

I'm pretty sure that the answer is no, but wanted to confirm...

When training a neural network or other learning algorithm, we will compute the cost function J(θ) as an expression of how well our algorithm fits the training data (higher values mean it fits the data less well). When training our algorithm, we generally expect to see J(theta) go down with each iteration of gradient descent.

But I'm just curious, would there ever be a value to computing J(θ) against our test data?

I think the answer is no, because since we only evaluate our test data once, we would only get one value of J(θ), and I think that it is meaningless except when compared with other values.

desertnaut · Accepted Answer · 2017-12-17T01:28:51.867

Your question touches on a very common ambiguity regarding the terminology: one between the validation and the test sets (the Wikipedia entry and this Cross Vaidated post may be helpful in resolving this).

So, assuming that you indeed refer to the test set proper and not the validation one, then:

You are right in that this set is only used once, just at the end of the whole modeling process
You are, in general, not right in assuming that we don't compute the cost J(θ) in this set.

Elaborating on (2): in fact, the only usefulness of the test set is exactly for evaluating our final model, in a set that has not been used at all in the various stages of the fitting process (notice that the validation set has been used indirectly, i.e. for model selection); and in order to evaluate it, we obviously have to compute the cost.

I think that a possible source of confusion is that you may have in mind only classification settings (although you don't specify this in your question); true, in this case, we are usually interested in the model performance regarding a business metric (e.g. accuracy), and not regarding the optimization cost J(θ) itself. But in regression settings it may very well be the case that the optimization cost and the business metric are one and the same thing (e.g. RMSE, MSE, MAE etc). And, as I hope is clear, in such settings computing the cost in the test set is by no means meaningless, despite the fact that we don't compare it with other values (it provides an "absolute" performance metric for our final model).

You may find this and this answers of mine useful regarding the distinction between loss & accuracy; quoting from these answers:

Loss and accuracy are different things; roughly speaking, the accuracy is what we are actually interested in from a business perspective, while the loss is the objective function that the learning algorithms (optimizers) are trying to minimize from a mathematical perspective. Even more roughly speaking, you can think of the loss as the "translation" of the business objective (accuracy) to the mathematical domain, a translation which is necessary in classification problems (in regression ones, usually the loss and the business objective are the same, or at least can be the same in principle, e.g. the RMSE)...

You're correct that I had classification settings in mind. I'm still not sure what the value would be of calculating the cost on the test set in a classification setting, other than a curiosity. But you make a good point that in a regression setting the cost might be equivalent to the business objective. — Stephen, Dec 17 '17 at 02:01
I'm curious what you think one would do on the validation set. You could use cost as your metric for choosing a model, but you could also use your business objectives if they are different (e.g. accuracy). — Stephen, Dec 17 '17 at 02:03
@Stephen in classification 1) normally you wouldn't care for the test cost, as I have implied already in the answer 2) for model selection using the validation set, again normally you would use the business metric and not the cost. Generally speaking, the intuition you express in the post would be correct, if you had confined the discussion in classification only settings — desertnaut, Dec 17 '17 at 08:18
A further thought: I guess one important reason for calculating the cost on the validation set (or even the test set), is that it can be helpful to have a common metric for comparing performance on these data sets, since that can give a sense of how much you might be overfitting. — Stephen, Dec 17 '17 at 20:01
@Stephen Not for the test set, since this would require repeated evaluations on it (you diagnose overfitting from the divergence of the *loss curves*, not by comparing values). As for the validation set, you can, but you also can use your business metric for the same purpose — desertnaut, Dec 18 '17 at 02:02
By loss curves do you mean the curves plotting loss against the regularization factor, or curves plotting loss against training set size (learning curves)? I guess it could be both since overfitting would be apparent in both. — Stephen, Dec 18 '17 at 14:24
@Stephen I had in mind loss vs training iterations, like [here](https://stackoverflow.com/questions/47817424/loss-accuracy-are-these-reasonable-learning-curves/47819022#47819022) — desertnaut, Dec 18 '17 at 14:29

would we ever compute the cost J(θ) on the *test* set?

1 Answers1

would we ever compute the cost J(θ) on the test set?