2

I'm attempting to evaluate the goodness of fit of a logistic regression model I have constructed. Initially, it was recommended that I use the Hosmer-Lemeshow test, but upon further research, I learned that it is not as reliable as the omnibus goodness of fit test as indicated by Hosmer et al. It is my understanding that residual.lrm in the R rms package is the method to run the le Cessie - van Houwelingen - Copas - Hosmer unweighted sum of squares test.

I have constructed the following model:

> NEDOCModel <- glm(complication ~ ultrasound + fNEDOC, family = "binomial", data = modelmain);
> summary(NEDOCModel);

Call:
glm(formula = complication ~ ultrasound + fNEDOC, family = "binomial", 
data = modelmain, x = TRUE, y = TRUE)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.5841  -0.5812  -0.4899  -0.4899   2.0878  

Coefficients:
                              Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   -1.69293    0.10126 -16.719   <2e-16 ***
ultrasound1                   -0.36661    0.12514  -2.929   0.0034 **
fNEDOCOvercrowded (140 - 200)  0.01087    0.13524   0.080   0.9359    

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1765.6  on 2284  degrees of freedom
Residual deviance: 1757.1  on 2282  degrees of freedom

AIC: 1763.1

Number of Fisher Scoring iterations: 4

where complication is a binary outcome (0 or 1), ultrasound and fNEDOC are binary predictors (0 or 1).

Following the description (and examples) for the residual.lrm function, I receive the following error:

> resid(NEDOCModel, "gof");
Error in match.arg(type) : 

'arg' should be one of “deviance”, “pearson”, “working”, “response”, “partial”

Being an amateur and relatively new to this field, I would appreciate any assistance in resolving this error and guidance in ensuring I'm properly evaluating my model.

Thanks in advance!

EDIT: Here's a small subset of the data:

simExample <- structure(list(complication = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", 
"1"), class = "factor"), ultrasound = structure(c(1L, 2L, 2L, 
1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 
1L), .Label = c("0", "1"), class = "factor"), fNEDOC = structure(c(1L, 
2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 
1L, 1L, 2L), .Label = c("Not Overcrowded (00 - 140)", "Overcrowded (140 -
200)"), class = "factor")), .Names = c("complication", "ultrasound",
"fNEDOC"), row.names = c(NA, 20L), class = "data.frame")

View(simExample)
   complication ultrasound                     fNEDOC
1             0          0 Not Overcrowded (00 - 140)
2             0          1    Overcrowded (140 - 200)
3             0          1 Not Overcrowded (00 - 140)
4             0          0 Not Overcrowded (00 - 140)
5             0          1 Not Overcrowded (00 - 140)
6             0          0 Not Overcrowded (00 - 140)
7             0          0 Not Overcrowded (00 - 140)
8             0          1    Overcrowded (140 - 200)
9             0          1 Not Overcrowded (00 - 140)
10            1          0    Overcrowded (140 - 200)
11            0          0 Not Overcrowded (00 - 140)
12            0          1 Not Overcrowded (00 - 140)
13            0          1 Not Overcrowded (00 - 140)
14            0          1    Overcrowded (140 - 200)
15            0          1    Overcrowded (140 - 200)
16            0          1 Not Overcrowded (00 - 140)
17            0          1 Not Overcrowded (00 - 140)
18            0          0 Not Overcrowded (00 - 140)
19            0          1 Not Overcrowded (00 - 140)
20            0          0    Overcrowded (140 - 200)
Niraj Vyas
  • 21
  • 4
  • 1
    The function you name is `residuals()`, not `resid()`. Did you try `residuals(NEDOCModel, "gof")`? When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 13 '18 at 19:24
  • @MrFlick Yes. I've tried both; `resid()` is a shortened call for `residuals()`. I'll add a small subset of the data in a few minutes. – Niraj Vyas Mar 13 '18 at 19:35
  • You should share data in a **reproducible** format as described in the link provided. What you've shown is very hard to copy/paste into R to test. – MrFlick Mar 13 '18 at 19:47
  • Why not use a dataset that is in one of help pages? – IRTFM Mar 13 '18 at 19:50
  • @MrFlick I've added the code required to add the subset as a data frame as indicated in the link. Let me know if there are still issues. – Niraj Vyas Mar 13 '18 at 20:12
  • @42- I'm having trouble finding the page you are referencing. I've added the data set I'm using because the generation method (as indicated in the examples for `residual.lrm`) works, but my data does not. – Niraj Vyas Mar 13 '18 at 20:12
  • 1
    The `residuals.rlm` function works on models created with the `lrm()` function. You used the `glm()` function so you need to look at the `?residuals.glm` help page. There is no "gof" options for glm models. I'm not really even sure what that's supposed to do from the help page. – MrFlick Mar 13 '18 at 20:16
  • 1
    @MrFlick Constructing the model with `lrm()` does allow for the `resid()` function to work properly. I just want to confirm before I go off and learn some more stuff: `lrm()` is an appropriate replacement for `glm()` in this case because the outcome is binary? Thanks for your help! – Niraj Vyas Mar 13 '18 at 20:26
  • seconding @MrFlick's comment; yes. `lrm()` is essentially a parallel-universe implementation of `glm()` that allows some different options. – Ben Bolker Mar 13 '18 at 20:54

1 Answers1

0

If the outcome is either binomial or ordered, then lrm can be used. The examination of generalized measures of "fit" is not what I use to judge the validity of a model, however. I look at the residuals against each variable to consider the possibility of non-linearities and assess the possibility that the rcs spline functions are needed for better fit. Your example is insufficiently complex to support a demonstration of that approach. And .. you are saying the something (no code actually included in your question) doesn't "work" (without a clear statement of what "working" might mean.

library(rms)
NEDOCModel <- lrm(complication ~ ultrasound + fNEDOC, data = simExample, y=TRUE,x=TRUE);

 residuals(NEDOCModel)
#--------
 [1] -9.437191e-05 -1.746181e-04 -1.648351e-08 -9.437191e-05 -1.648351e-08 -9.437191e-05
 [7] -9.437191e-05 -1.746181e-04 -1.648351e-08  5.000005e-01 -9.437191e-05 -1.648351e-08
[13] -1.648351e-08 -1.746181e-04 -1.746181e-04 -1.648351e-08 -1.648351e-08 -9.437191e-05
[19] -1.648351e-08 -4.999995e-01
residuals(NEDOCModel,type="gof")
#-------
Sum of squared errors     Expected value|H0                    SD                     Z 
         0.5000001754          0.5012646602          0.0003628642         -3.4847333888 
                    P 
         0.0004926276 

You should, however, learn to use the rest of the features of the rms/Hmisc system of interconnected functions. You have not defined datadist, so the summary function will not work with lrm-classed objects

ddist <- datadist(simExample)
options(datadist='ddist')

summary(NEDOCModel)
#-----------
            Effects              Response : complication 

 Factor                                                       Low High Diff. Effect     S.E.  
 ultrasound - 0:1                                             2   1    NA        8.6527 37.864
  Odds Ratio                                                  2   1    NA     5725.8000     NA
 fNEDOC - Overcrowded (140 -\n200):Not Overcrowded (00 - 140) 1   2    NA        9.2682 42.045
  Odds Ratio                                                  1   2    NA    10595.0000     NA
 Lower 0.95  Upper 0.95
 -6.5559e+01 8.2865e+01
  3.3731e-29 9.7195e+35
 -7.3139e+01 9.1676e+01
  1.7218e-32 6.5198e+39
IRTFM
  • 258,963
  • 21
  • 364
  • 487