0

im trying to find the most calibrated model using Logistic Regression for a binary question and using log loss/brier score to know what is the best, but i want to know what im doing wrong with my model, when i put the same variables X in different order i had different result

this is my code

X = df[['casaexp','k5m','side','d5m','g5']] y = df.victory

train_perc = 0.4
val_perc = 0.3
test_perc = 0.3
rs = 1234

X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=test_perc, random_state=rs)

X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=val_perc, random_state=rs,stratify=y_train_val)


def shuffle_data(data, n):
    return np.hstack((data[:,n:], data[:,:n]))

lgbm_clf = LogisticRegression(penalty='none', random_state=1234) lgbm_clf.fit(X_train, y_train)

preds_uncalibrated_test = lgbm_clf.predict_proba(X_test)[:,1]

print('Uncalibrated log_loss = {}'.format(log_loss(y_test, preds_uncalibrated_test))) print('Uncalibrated ROC_AUC = {}'.format(roc_auc_score(y_test, preds_uncalibrated_test))) print(f"Uncalibrated F1 on the test set: {f1_score(y_test, lgbm_clf.predict(X_test)):.5f}") print('Uncalibrated brier_score_loss = {}'.format(brier_score_loss(y_test, preds_uncalibrated_test))) accuracy = accuracy_score(y_test, lgbm_clf.predict(X_test)) print(' %0.4f accuracy.' % accuracy)

and this is my result

Uncalibrated log_loss = 0.5660449696**400124**
Uncalibrated ROC_AUC = 0.7710594315245478
Uncalibrated F1 on the test set: 0.69663
Uncalibrated brier_score_loss = 0.19366689455**770922**
 0.6932 accuracy.

but if i change the order i had a little bit different result

X = df[['k5m','side','d5m','casaexp','g5']] y = df.victory

Uncalibrated log_loss = 0.5660449696**409053**
Uncalibrated ROC_AUC = 0.7710594315245478
Uncalibrated F1 on the test set: 0.69663
Uncalibrated brier_score_loss = 0.19366689455**804897**
 0.6932 accuracy.

using the same code but with different order in variables on X, why this happen and what i need to do to stop doing this?

i tried to use random state and i tried to remove train/test training but still happen the same "error" and i want to do the same result even with different order

lep139
  • 1

0 Answers0