Issues performing Hosmer-Lemeshow test

Question

Please help!

I am trying to perform a HL test to assess the goodness of fit for my model but I keep getting the above error!

I have these packages installed: library(jtools) library(skimr) library(epiR) library(rms) library(epimisc) library(DescTools) library(car) library(readxl) library(summarytools) library(survival) library(ggplot2) library(survminer) library(PredictABEL)

This is my code:

# Final model
    l_final <- glm(pro$treatment ~ pro$age,family = binomial(link="logit"),data = pro, x=TRUE)
    summ(l_final)
  
# Assessing the fit of the model
    
    #predict proabailities
    pro$l_final_pred <- predict(l_final, type = "response")

    # Hosmer-Lemeshow test
    HosmerLemeshowTest(pro$l_final_pred, pro$treatment, X = l_final_pred$x)

This is the output:

> # Assessing the fit of the model
>     pro$l_final_pred <- predict(l_final, type = "response")
Error:
! Assigned data `predict(l_final, type = "response")` must be compatible with existing data.
✖ Existing data has 866 rows.
✖ Assigned data has 864 rows.
ℹ Only vectors of size 1 are recycled.
Backtrace:
  1. base::`$<-`(`*tmp*`, l_final_pred, value = `<dbl>`)
 12. tibble (local) `<fn>`(`<vctrs___>`)
>     # Hosmer-Lemeshow test
>     HosmerLemeshowTest(pro$l_final_pred, pro$treatment, X = l_final_pred$x)   
Error in cut.default(fit, breaks = brks, include.lowest = TRUE) : 
  'x' must be numeric
In addition: Warning message:
Unknown or uninitialised column: `l_final_pred`.

Thankyou!!

I know it has something to do with the lengths of the data sets (there are two missing in treatment = 864, compared to age - 866) but can't quite work out for to fix it

roma · Answer 1 · 2023-05-01T22:14:21.883

Try to check the data

Does NULL/NA exist?

If yes, understand why it exists and if necessary --- remove those lines

(using, for example: 1 , 2 or 3).

Is the type “adequate” for all data?

For example, the number may be symbolic, which is not adequate

(using, for example: 4 , 5 or 6).

As I would do

if (!require("pacman"))
  install.packages("pacman")

# install/load nedded library
pacman::p_load(
  hablar)


# create data frame
df <- data.frame(
  x = c(1, 3, NA,-9.5, "NULL"),
  y = c("2", "5", "-0.1", "-3", "100"),
  text1 = letters[1:5],
  text2 = LETTERS[1:5],
  z = c(78, 10,-99, NA, 99)
)


# check data type of data frame
str(df)


# replace NULL to NA, because they are the same thing
df[df == "NULL"] <- NA

# filter rows with NA
df[!complete.cases(df), ]
# (replace NA to values or remove rows whit them)


# remove rows with NA
df <- na.omit(df)


# fix "inadequate" data
df <- retype(df) 
str(df)

Output

str(df)
'data.frame':   5 obs. of  5 variables:
 $ x    : chr  "1" "3" NA "-9.5" ...
 $ y    : chr  "2" "5" "-0.1" "-3" ...
 $ text1: chr  "a" "b" "c" "d" ...
 $ text2: chr  "A" "B" "C" "D" ...
 $ z    : num  78 10 -99 NA 99

filter rows with NA
     x    y text1 text2   z
3 <NA> -0.1     c     C -99
4 -9.5   -3     d     D  NA
5 <NA>  100     e     E  99

remove rows with NA
  x y text1 text2  z
1 1 2     a     A 78
2 3 5     b     B 10

str(df) after fix "inadequate" data
'data.frame':   2 obs. of  5 variables:
 $ x    : int  1 3
 $ y    : int  2 5
 $ text1: chr  "a" "b"
 $ text2: chr  "A" "B"
 $ z    : int  78 10
 - attr(*, "na.action")= 'omit' Named int [1:3] 3 4 5
  ..- attr(*, "names")= chr [1:3] "3" "4" "5"

Thanks for pointing to these links as possible solutions. However, please review [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer), particularly the "Provide context for links" section. Providing further context for your answer will help others assess whether these options will solve the OP's issues, especially if those links go dead in the future. Thanks. — L Tyrone, Apr 28 '23 at 23:09

Issues performing Hosmer-Lemeshow test

1 Answers1