Here the difference between test, train and validation set is described. In most documentation on training neural networks, I find that these three sets are used, however they are often predefined.
I have a relatively small data set (906 3D images in total, the distribution is balanced). I'm using sklearn.model_selection.train_test_split
function to split the data in train and test set and using X_test and y_test as validation data in my model.
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=1)
...
history = AD_model.fit(
X_train,
y_train,
batch_size=batch_size,
epochs=100,
verbose=1,
validation_data=(X_test, y_test))
After training, I evaluate the model on the test set:
test_loss, test_acc = AD_model.evaluate(X_test, y_test, verbose=2)
I've seen other people also approach it this way, but since the model has already seen this data, I'm not sure what the consequences are of this approach. Can someone tell me what the consequences are of using the same set for validation and testing? And since I already have a small data set (with overfitting as a result), is it necessary to split the data in 3 sets?