im trying to find the most calibrated model using Logistic Regression for a binary question and using log loss/brier score to know what is the best, but i want to know what im doing wrong with my model, when i put the same variables X in different order i had different result
this is my code
X = df[['casaexp','k5m','side','d5m','g5']] y = df.victory
train_perc = 0.4
val_perc = 0.3
test_perc = 0.3
rs = 1234
X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=test_perc, random_state=rs)
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=val_perc, random_state=rs,stratify=y_train_val)
def shuffle_data(data, n):
return np.hstack((data[:,n:], data[:,:n]))
lgbm_clf = LogisticRegression(penalty='none', random_state=1234) lgbm_clf.fit(X_train, y_train)
preds_uncalibrated_test = lgbm_clf.predict_proba(X_test)[:,1]
print('Uncalibrated log_loss = {}'.format(log_loss(y_test, preds_uncalibrated_test))) print('Uncalibrated ROC_AUC = {}'.format(roc_auc_score(y_test, preds_uncalibrated_test))) print(f"Uncalibrated F1 on the test set: {f1_score(y_test, lgbm_clf.predict(X_test)):.5f}") print('Uncalibrated brier_score_loss = {}'.format(brier_score_loss(y_test, preds_uncalibrated_test))) accuracy = accuracy_score(y_test, lgbm_clf.predict(X_test)) print(' %0.4f accuracy.' % accuracy)
and this is my result
Uncalibrated log_loss = 0.5660449696**400124**
Uncalibrated ROC_AUC = 0.7710594315245478
Uncalibrated F1 on the test set: 0.69663
Uncalibrated brier_score_loss = 0.19366689455**770922**
0.6932 accuracy.
but if i change the order i had a little bit different result
X = df[['k5m','side','d5m','casaexp','g5']] y = df.victory
Uncalibrated log_loss = 0.5660449696**409053**
Uncalibrated ROC_AUC = 0.7710594315245478
Uncalibrated F1 on the test set: 0.69663
Uncalibrated brier_score_loss = 0.19366689455**804897**
0.6932 accuracy.
using the same code but with different order in variables on X, why this happen and what i need to do to stop doing this?
i tried to use random state and i tried to remove train/test training but still happen the same "error" and i want to do the same result even with different order