I am trying to have this function select the model with the higher validation accuracy as the final mode, retrain the selected final model on the training+validation set, and then calculate the prediction on the test set and the accuracy of the test set predictions. I have everything I think I need to compare the models but can not think of the most appropriate way to select the best model and continue as mentioned above all within the function.
def compare_models(X,y,model1,model2,test_size,val_size,random_state=0):
# Split data first into training and testing to get test set using 15% of data for test
X_train_full,X_test,y_train_full,y_test = train_test_split(X, y, random_state=0,test_size=0.15)
# Now split the training set again into training and validation, using 15% of training data for validation
X_train,X_val,y_train,y_val = train_test_split(X_train_full,y_train_full,random_state=0,test_size=0.15)
# Compare the performance of the two models using the validation set
model1.fit(X_train,y_train)
val_preds_model1 = model1.predict(X_val)
model2.fit(X_train,y_train)
val_preds_model2 = model2.predict(X_val)
# Calculate the validation accuracy of each model
acc_val_model1 = sum(val_preds_model1==y_val)/len(y_val)
acc_val_model2 = sum(val_preds_model2==y_val)/len(y_val)
# Train our selected model on the training plus validation sets
XXX.fit(X_train_full,y_train_full)
# Evaluate its performance on the test set
preds_test = XXX.predict(X_test)
acc_test = sum(preds_test==y_test)/len(y_test)
return acc_test