How to change decision threshold on a loaded logistic regression model

Question

I´m working on a logistic regression model using Python and I managed to adjust the threshold manually. However, when I save the model using pickle, the threshold doesn´t seem to change. I get exactly the same results for different thresholds. Here´s the code:

filename = 'model202104.sav'
pickle.dump(logreg, open(filename, 'wb'))
loaded_model2 = pickle.load(open(filename, 'rb'))
result = loaded_model2.score(X_test, y_pred)
print(result)

Here´s the code I use to manually change thresholds:

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=.2,random_state=7)
logreg = LogisticRegression(max_iter=10000)
logreg.fit(X_train,y_train)
#y_pred=logreg.predict(X_test)
THRESHOLD=0.5
y_pred=np.where(logreg.predict_proba(X_test)[:,1] > THRESHOLD, 1, 0)

Thanks in advance :)

I don't quite understand are you trying to save your manual threshold to the .sav file? can you show the code you used to create the .sav file? — darth baba, Jun 23 '21 at 17:43
Yes, that´s what I´m trying to do. The code I use to create the .sav file is already posted in the question. — LCBM, Jun 23 '21 at 17:49

Arturo Sbr · Accepted Answer · 2021-06-23T17:46:30.543

1

The second argument for score is supposed to be the true observed values, not y_pred.

# Load model
loaded_model2 = pickle.load(open('model202104.sav', 'rb'))

# Score model with `y_test`
result = loaded_model2.score(X_test, y_test) # You had `y_pred` here
print(result)

Moreover, you always have to set the threshold manually in sklearn. Otherwise, LogisticRegression always classifies as 1 if the predicted probability is greater than or equal to 0.5. So to score your model with a custom threshold:

# Import accuracy score function
from sklearn.metrics import accuracy_score

# Classify with custom threshold (for example, 0.85)
thr = 0.85
y_pred = np.where(loaded_model2.predict_proba(X_test)[:, 1] >= thr, 1, 0)

# Score
print('Accuracy with threshold set to', str(thr) + ':', accuracy_score(y_test, y_pred))

edited Jun 23 '21 at 17:46

answered Jun 23 '21 at 17:40

Arturo Sbr

5,567
4
38
76

Thank you for your response, is it possible to change that threshold for the loaded model. What if I want my model to use .85 to make predictions when I test it on other data. – LCBM Jun 23 '21 at 17:58
You have to use the code in the second snippet (or something similar) to manually calculate the `score` (which is the mean accuracy by default). You cannot change it with a parameter. [As you can see here](https://stackoverflow.com/questions/28716241/controlling-the-threshold-in-logistic-regression-in-scikit-learn), sklearn always uses 0.5 as the threshold. – Arturo Sbr Jun 23 '21 at 18:04

How to change decision threshold on a loaded logistic regression model

1 Answers1