3

I've called a multiple logistic regression as follows:

step_1 <- glm(CD3LR ~ alb + surg + ASA_opt + BMI + bil + Hb_cat + MDRD + sex + DM
              + age + Path + Smoking,
              na.action = na.exclude, family = binomial)

When I want to test the model by creating a ROC curve, I use the pROC package and call:

roc(CD3LR, step_1$fitted.values, plot=FALSE)

However this returns the error:

> roc(CD3LR, step_1fitted.values, plot=FALSE) 
Setting levels:control=0,case=1
Error in roc.default(CD3LR,step_1fitted.values, plot = FALSE) : 
  Response and predictor must be vectors of the same length.

I think this is because there are 3 missing values for the MDRD variable and because I've used na.exclude this results in 3 fewer step_1$fitted.values than I have for CD3LR

Is there a way to generate a ROC curve using only the CD3LR values that correspond to the step_1$fitted.values??

Very grateful for any help!

Calimo
  • 7,510
  • 4
  • 39
  • 61
donm79
  • 31
  • 1
  • 3

2 Answers2

2

Your intuition is correct, some values have been filtered out by na.action=na.exclude.

Typically I would recommend using the predict function to obtain new predictions on the data. They look like this:

> predict(step_1)
1           2           3           4           5           6           7   ...
NA          NA  1.04059269  0.60248768  0.81502210  0.23992288  0.08421514  ...

As you can see, the missing values in MDRD produce an NA, rather than being removed.

You can then feed these predictions directly to the roc function:

roc(CD3LR, predict(step_1))
Calimo
  • 7,510
  • 4
  • 39
  • 61
  • Does the following also work ok for the roc call? roc(step_1$y,step_1$fitted.values,plot=FALSE). When I changed CD3LR to step_1$y it seems to work perfectly! – donm79 Feb 06 '20 at 09:59
  • @donm79 sure, but I would recommend using predict. Then you can pass it a test dataset with `predict(step_1, newdata = TestDataFrame)` without changing anything. – Calimo Feb 06 '20 at 17:58
1

I notice you have all your variables in the environment, for example like this, and below I introduce 3NA for sex and 2 NA for BMI,

CD3LR = as.numeric(runif(100)>0.5)
alb = rnorm(100)
surg = sample(1:3,100,replace=TRUE)
ASA_opt = rpois(100,50)
BMI = c(NA,NA,rpois(98,100))
bil = rnorm(100)
Hb_cat = sample(1:3,100,replace=TRUE)
MDRD = runif(100)
sex = c(sample(c("M","F"),98,replace=TRUE),NA,NA)
DM = rnorm(100)
age = sample(20:60,100,replace=TRUE)
Path = rnorm(100)
Smoking = sample(c("Yes","NI"),100,replace=TRUE)

So best is to put all of them into a data.frame, do the fit and then the roc curve:

DataFrame = data.frame(
CD3LR,alb,surg,ASA_opt,BMI,bil,Hb_cat,
MDRD,sex,DM,age,Path,Smoking) 

step_1<-glm(CD3LR~alb+surg+ASA_opt+BMI+bil+Hb_cat+MDRD+sex+DM+age+Path+Smoking,
data=DataFrame,na.action=na.exclude,family=binomial)

Thanks to @Calimo for pointing it out, you can also fit with:

step_1<-glm(CD3LR~.,
data=DataFrame,na.action=na.exclude,family=binomial)

This gives you the error:

roc(DataFrame$CD3LR,step_1$fitted.values,plot=FALSE)

We can do:

roc(step_1$y,step_1$fitted.values,plot=FALSE)

Or:

roc(DataFrame[complete.cases(DataFrame),"CD3LR"],step_1$fitted.values,plot=FALSE)

The reason for having a data frame is that you can immediately see the complete set of data which is non NA for all. If you have any NAs in the predictor or response, it will not be used in the regression.

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Now that you have this nice data.frame, you can simplify your formula to `CD3LR ~ .` to make it a bit more readable... and also maybe some spaces after the commas? – Calimo Feb 05 '20 at 20:08
  • Thank you so much for the help! I had read in the data from Excel using readxl and I just defined all the variables in the environment. I think where I went wrong was in my roc call: roc(CD3LR,step_1$fitted.values,plot=FALSE). When I changed CD3LR to step_1$y it worked perfectly! – donm79 Feb 06 '20 at 09:59
  • Cool :) If you feel satisfy by this answer, feel free to accept it and/or upvote it. More information on that here: stackoverflow.com/help/someone-answers – StupidWolf Feb 06 '20 at 10:03