0

I am looking at a dataset with one continuous independent variable (Quant) and one binary dependent variable (Binary). I used a multinomial model to predict the binary value from the continuous independent variable. I was hoping to make a ROC curve. This is the code below:

mymodel <- multinom(Quant~., data = dataset)
pred <- predict(mymodel,dataset)
roc_pred <- prediction(pred,dataset$Binary)
roc <- performance(roc_pred,"tpr","fpr")

Right now, if I run this code, I get the following error message: "Format of predictions is invalid." I'm not sure why my pred object wouldn't satisfy the requirements for the prediction function? The only way this will work is if I put in the following line of code instead : pred <- predict(mymodel,dataset,type="prob")

However, this is getting me some strange values in the pred matrix. As my dependent variable is binary, I am expecting to get either a value of 0 or 1 in my pred variable (which is what I get with the original line of code), but when I add the type="prob", it gives me a 0.3 value for all of the observations where the independent variable (Quant) is equal to 0. What is the type="prob" changing, and why can't I just use the original line of code to get my ROC curve? Thank you.

Calimo
  • 7,510
  • 4
  • 39
  • 61
Byakko
  • 21
  • 4
  • It's easier to help if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can run and test your code. – MrFlick Jul 27 '17 at 02:36
  • Figured it out! Turns out my dependent variable wasn't categorized as a numerical variable due to some iffiness from the Excel import. The problem wasn't the prediction function...it was actually going back to the multinomial model – Byakko Jul 27 '17 at 03:05

1 Answers1

0

prob is giving you probability. You need to convert the probability to binary outcome by using a threshold. This can be achieved by

pred <- predict(mymodel,dataset,type="prob")

# intialize as zero
pred_binary <-  integer(length(pred))

# if the probability exceed 0.5, treat that as 1
pred_binary[pred > 0.5] <- 1

Then pred_binary is your desired binary outcome. Here, threshold is 0.5. You may change that depending on your situation. Most people often start with 0.5 and then change the threshold if necessary, often in the case of imbalanced dataset.

Sal
  • 335
  • 2
  • 8