I have a data set I am trying to fit with bam() in the mgcv package. The model has a binary outcome and I need to specify random intercepts for each animal ID. A subset of the data is below (my actual data is much larger with many more covariates):
dat2 <- read.csv('https://github.com/silasbergen/example_data/raw/main/dat2.csv')
dat2$Animal_id <- factor(dat2$Animal_id)
> head(dat2)
Animal_id DEM_IA Anyrisk
1 105 279.94 0
2 105 278.68 0
3 106 329.13 0
4 106 329.93 0
5 106 332.25 0
6 106 333.52 0
> summary(dat2)
Animal_id DEM_IA Anyrisk
105: 2 Min. :156.3 Min. :0.0000
106: 83252 1st Qu.:246.8 1st Qu.:0.0000
107: 22657 Median :290.1 Median :0.0000
108:104873 Mean :284.8 Mean :0.3619
109:142897 3rd Qu.:318.0 3rd Qu.:1.0000
110: 53967 Max. :411.8 Max. :1.0000
I want to fit the model and predict for new data without the random effect:
library(mgcv)
mod <- bam(Anyrisk ~s(Animal_id,bs="re") + s(DEM_IA), data = dat2, family = "binomial",discrete=TRUE)
topred <- data.frame(DEM_IA = c(280,320))
predict(mod, newdata = topred, exclude="s(Animal_id)",newdata.guaranteed = TRUE)
But this throws an error:
Error in eval(predvars, data, env) : object 'Animal_id' not found
Why would it need Animal_id
when I am specifically telling it to exclude this term from the prediction? This is also especially strange as I can run the similar examples in the ?random.effects
mgcv
help file, no problem, even if I modify those examples to use bam() instead of gam()! Any help would be greatly appreciated!
EDIT
I may have found a fix; apparently if using discrete=TRUE
in the bam()
model, then predict.bam()
also uses discrete=TRUE
which will not work with missing random effect, but this works:
mod<- bam(Anyrisk ~s(Animal_id,bs="re") + s(DEM_IA), data = dat2, family = "binomial",discrete=TRUE)
topred <- data.frame(DEM_IA = c(280,320))
predict(mod,topred, exclude="s(Animal_id)",newdata.guaranteed = TRUE,discrete=FALSE)
Output:
1 2
-0.4451066 -0.0285989