I am trying to calculate the AUC-ROC curve for a logistic regression function using the following code:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)
lr_model = LogisticRegression(solver='liblinear', random_state=0).fit(x_train, y_train)
lr_test_probs = lr_model.predict_proba(x_test)
lr_auc = roc_auc_score(y_test, lr_test_probs)
When I run this, I get the following error: ValueError: y should be a 1d array, got an array of shape (423, 2) instead.
I think this is because of y_test, which has the shape (423,2) as it has 423 rows and 2 columns, the first column being the index. I tried to get rid of the index by using to_numpy() like so:
y_test2=y_test.to_numpy()
y_train2=y_train.to_numpy()
x_test2=x_test.to_numpy()
x_train2=x_train.to_numpy()
but I still got the same error, even though when I call y_test2 it is simply an array with 0s and 1s and no index column. I also tried to change x and y into arrays first using the to_numpy() function and then calculate the test and train data after, and I am still getting the same error.
Any advice on what to do?
edit: running print(y.head(5)) gives: print(y.head(5))