My data resource:https://www.kaggle.com/mlg-ulb/creditcardfraud
The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions,
I was using the PRROC package to get AUC of ROC curve, here is my random forest code:
rf.model <- randomForest(Class ~ ., data = training, ntree = 2000, nodesize = 20)
rf_pred <- predict(rf.model, test,type="prob"
so, as expected, rf_pred should return the probability of each class :
Then, i used the following code:
fg_rf <- rf_pred[test$Class==1]
bg_rf <- rf_pred[test$Class==0]
roc_rf <- roc.curve(scores.class0 = fg_rf,scores.class1 = bg_rf,curve = T)
However, the ROC CURVE turned out to be not what as i expected
The same problem occurred for PR curve. Is it because of high imbalance in class?
And assuming rf_pred returns the probability of 0/1, how can i let fg_rf equals to the probability of calss=1, is my code:
fg_rf <- rf_pred[test$Class==1]
correct?