I trained a binomial model using glm(Xtrain, ytrain, formula='cbind(Response, n - Response) ~ features', family='binomial')
, where ytrain is a response matrix with columns of counts (yes), counts (no).
The test responses I've held out are in the same form of response matrix. However, the predict() function returns probabilities -- one for each row of the training data. I now want to use the ROCR or AUC package to generate AUC curves, but my prediction and observations are in different formats. Does anyone know how to do this?
OK. Adding an example. Forgive it being meaningless/rank deficient/small, I only want to illustrate my case.
plants <- c('Cactus', 'Tree', 'Cactus', 'Tree', 'Flower', 'Tree', 'Tree')
sun <- c('Full', 'Half', 'Half', 'Full', 'Full', 'Half', 'Full')
water <- c('N', 'Y', 'Y', 'N', 'Y', 'N', 'N')
died <- c(10, 10, 8, 2, 15, 20, 12)
didntdie <- c(2, 10, 8, 20, 10, 10, 10)
df <- data.frame(died, didntdie, plants, sun, water)
dftrain <- head(df, 5)
dftest <- tail(df, 2)
model <- glm("cbind(died, didntdie) ~ plants + sun + water", data=dftrain, family="binomial")
At this point, predict(model, dftest)
returns the log-odds (giving a probability of death) for the final two sets of features in my dataframe. Now I wish to compute an AUC curve. My observations are in dftest[c('died','didntdie')]
. My predictions are essentially probabilities. AUC, ROCR, etc expect both predictions and observations to be a list of bernoulli responses. I can't find documentation on how to use this response matrix instead. Any help appreciated.