Here I have this piece of python code, taken from SoloLearn,
scores = []
kf = KFold(n_splits=5, shuffle=True)
for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
model = LogisticRegression()
model.fit(X_train, y_train)
scores.append(model.score(X_test, y_test))
print(scores)
print(np.mean(scores))
My question then is, do I need to create a new model in every split? Why don't we just create one LogisticRegression before the for?
I would put it before to save computation time, but since it has been presented this way I thought there was a reason.