0

I am trying to have this function select the model with the higher validation accuracy as the final mode, retrain the selected final model on the training+validation set, and then calculate the prediction on the test set and the accuracy of the test set predictions. I have everything I think I need to compare the models but can not think of the most appropriate way to select the best model and continue as mentioned above all within the function.

def compare_models(X,y,model1,model2,test_size,val_size,random_state=0):
    # Split data first into training and testing to get test set using 15% of data for test
    X_train_full,X_test,y_train_full,y_test = train_test_split(X, y, random_state=0,test_size=0.15)

    # Now split the training set again into training and validation, using 15% of training data for validation
    X_train,X_val,y_train,y_val = train_test_split(X_train_full,y_train_full,random_state=0,test_size=0.15)

    # Compare the performance of the two models using the validation set
    model1.fit(X_train,y_train)
    val_preds_model1 = model1.predict(X_val)

    model2.fit(X_train,y_train)
    val_preds_model2 = model2.predict(X_val)
    # Calculate the validation accuracy of each model
    acc_val_model1 = sum(val_preds_model1==y_val)/len(y_val)
    acc_val_model2 = sum(val_preds_model2==y_val)/len(y_val)
    
    # Train our selected model on the training plus validation sets
    XXX.fit(X_train_full,y_train_full)

    # Evaluate its performance on the test set
    preds_test = XXX.predict(X_test)
    acc_test = sum(preds_test==y_test)/len(y_test)
    return acc_test
sourab maity
  • 1,025
  • 2
  • 8
  • 16
diesmiling
  • 27
  • 8

1 Answers1

2

Calling fit again on a model refits the model, so something like this should do it:

XXX = model1 if if acc_val_model1 > acc_val_model2 else model2

But maybe call it best_model or something, not XXX :P

Matt Hall
  • 7,614
  • 1
  • 23
  • 36