0

I am looking to model in R, clustered data with a GLM using the Gamma family and log link. Ultimately I want marginal predictions. The Prediction module doesn't seem to like clustered data. I am wondering if I am making a mistake or if there is another approach or other modules I should be considering?

Thanks for your help.

Unclustered GLM, Prediction module runs fine

> fit <-   glm(abssnrdiff~hour, 
+              family=Gamma(link="log"),
+              data=foranalsub2
+ )
> summary(prediction(fit, at = list(hour =c( "1", "2", "3"))), type="link")
at(hour)

Prediction     SE     z p lower upper
        1      8.312 0.1122 74.08 0 8.092 8.532
        2      8.095 0.1239 65.31 0 7.852 8.338
        3      8.189 0.1417 57.80 0 7.911 8.466
> 

Using miceadds for clustered GLM Prediction fails

> fit <-  miceadds::glm.cluster(data=foranalsub2,
+                               formula = abssnrdiff~ hour ,
+                               cluster="txrx", family=Gamma(link="log")
+ )
> summary(prediction(fit, at = list(hour =c( "1", "2", "3"))), type="link")


Error in find_data.default(model, parent.frame()) : 
  'find_data()' requires a formula call

Trying parglm also fails

> fit <-  parglm(
+   abssnrdiff~hour,
+   data=foranalsub2, family=Gamma(link="log"),  cluster="txrx" ,
+   control = parglm.control(nthreads = 4L),
+   na.action = na.omit)
> end_time <- Sys.time()
> summary(prediction(fit, at = list(hour =c( "1", "2", "3"))), type="link")


Error in qr.R(qr.lm(object)) : argument is not a QR decomposition

**Here is a toy example that generates the same errors shown above.**

...

library(parglm)
library(prediction)
library(sandwich)

a <- structure(list(Family = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 
                          4, 4, 4, 4, 4, 2), 
               Sex = c("F", "F", "F", "M", "M", "F", "F", 
                        "M", "M", "M", "F", "M", "F", "F", "M", "M", "M", "F"),
               Height = c(67, 66, 64, 71, 72, 63, 67, 69, 68, 70, 63, 64, 67, 66, 67, 69, 67, 63)),
               row.names = c(NA, 18L), class = "data.frame") 

fit <-       glm(Height~Sex, 
             family=Gamma(link="log"),
             data=a
)
summary(prediction(fit), type="link")



fit <-       miceadds::glm.cluster(Height~Sex,
             data=a, family="Gamma", cluster="Family"
                 
)
summary(prediction(fit), type="link")

fit <-       parglm(Height~Sex, 
                 data=a, family=Gamma(link="log"), cluster="Family",
                 control = parglm.control(nthreads = 1L),
                 na.action = na.omit
                 
)
summary(prediction(fit), type="link")


BobG
  • 1
  • 1
  • Welcome to SO, BobG! You will be most likely to get your question answered if you follow the [helpful advice](https://stackoverflow.com/help/how-to-ask) about asking questions and create a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – DaveArmstrong Feb 04 '22 at 13:07
  • Ah, good idea. I put a toy example above. I'm running in RStudio, 2021.09.1 Build 373 – BobG Feb 04 '22 at 16:01

1 Answers1

0

If you look at the internals of glm.cluster() you'll see that it is doing three things:

  1. Generating weights with lm_cluster_subset().
  2. Estimating a regular GLM with the weights constructed in step1.
  3. Using vcovCL() from the {sandwich} package to generate a variance-covariance matrix for the parameters.

You could do these things yourself and then get prediction to work.

library(prediction)
library(sandwich)

a <- structure(list(Family = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 
                               4, 4, 4, 4, 4, 2), 
                    Sex = c("F", "F", "F", "M", "M", "F", "F", 
                            "M", "M", "M", "F", "M", "F", "F", "M", "M", "M", "F"),
                    Height = c(67, 66, 64, 71, 72, 63, 67, 69, 68, 70, 63, 64, 67, 66, 67, 69, 67, 63)),
               row.names = c(NA, 18L), class = "data.frame") 


## Generate cluster weights
pos <- parent.frame()
cl_dat <- miceadds:::lm_cluster_subset(a, cluster="Family", weights=NULL, subset=NULL, pos=parent.frame())

## fit GLM
fit <-       glm(Height~Sex, 
                 family=Gamma(link="log"),
                 data=a, 
                 weights = cl_dat$wgt__)


## calculate variance-covariance matrix
library(sandwich)
v <- vcovCL(fit, cluster = a$Family)

## pass relevant arguments to prediction()
summary(prediction(fit, data=a, vcov=v), type="link")
#>  Prediction     SE     z p lower upper
#>       66.83 0.6002 111.4 0 65.66 68.01

Created on 2022-02-04 by the reprex package (v2.0.1)

It's worth noting that in this instance, the wgt__ value is just NULL, so you could even skip step 1 in this case because it's not providing any additional information to the GLM estimated in step 2. I include it here more for pedagogical reasons.

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25