I am trying to built a predictive model for multiclass data. I have use the roc_auc_score for validation. When the data is split into validation and testing datasets and cross validated I am not facing an error. However, when the same dataset is cross validated using pso best parameters I am getting the above error.
#test train split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.4, random_state = 2021)
X_val, X_test, y_val, y_test = train_test_split(X_val, y_val, test_size = 0.5, random_state = 2021)
Fiting the dataset
# Fit SVM classifier
clf_pso = SVC(kernel='rbf', probability=True, C=para_pso.best_param['C'], gamma=para_pso.best_param['gamma'])
clf_pso.fit(X_train, y_train)
print('(Cross Validation) AUC Score:', np.mean(cross_val_score(estimator=clf_pso, X=X_train, y=y_train, cv=5, scoring='roc_auc')))```
#Printing AUC score
print('(Test set) AUC Score:', roc_auc_score(y_test, clf_pso.predict_proba(X_test), average = 'macro', multi_class = 'ovo'))
Output error: ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'
The above error is in this line:
print('(Test set) AUC Score:', roc_auc_score(y_test, clf_default.predict_proba(X_test), average = 'macro', multi_class = 'ovo'))
For the below code the dataset gives no error
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2021)
# Fit SVM classifier
clf_default = SVC(kernel='rbf', probability=True)
clf_default.fit(X_train, y_train)
print('(Cross Validation) AUC Score:', np.mean(cross_val_score(estimator=clf_default, X=X_train, y=y_train, cv=5, scoring = 'roc_auc')))
#Printing AUC score
print('(Test set) AUC Score:', roc_auc_score(y_test, clf_default.predict_proba(X_test), average = 'macro', multi_class = 'ovo'))
using stratify=y
is also not working in both cases