0

I am not sure if this is a question for stackoverflow or crossvalidated as it contains both an R specific coding part and a general statistics part.

In a project we want to use predictive margins, or more general, means of predicted values. The model used for prediction is a logistic regression model. The data comes from multiple surveys conducted at different times without clusters and comes with weights for known characteristics in the population. The data is weighted for each timespan.

As the data will grow over time, we don't want to do the prediction from the data used for modelling but with a dataframe containing the information of the population.

Thanks to this answer I know how to calculate the confidence intervals for every observation.

However I want to get the mean propabilty for different groups. For the propabilities I can just calculate the mean of the propabilities, but how do I get the right confidence intervals? It seems that marginpred and svypredmeans from the package survey by Thomas Lumley do some of the things I want, but don't allow to do the prediction on new data.

Here is some code and data to show the approach. Please consider that in my real use case the data I use for predictions is not the same I use for modelling.

#libraries
library(dplyr)

# Data for modelling
data(mtcars)
# Get a weight column because in my real use case the data has population weights
mtcars["weight"] <- runif(nrow(mtcars), 500, 1000)
numtofac <- c("cyl", "vs", "am", "gear", "carb")
mtcars[numtofac] <- lapply(numtofac, function(x) factor(mtcars[[x]]))
mtcars["mpg20"] <- ifelse(mtcars$mpg >=20, 1, 0)

# Get prediction data (standardizes for "disp")
# In my real use case the data for prediction is not created from my data for modelling
mtcars_vs0 <- mtcars
mtcars_vs0["vs"] <- factor(0, levels=c(0,1))
mtcars_vs1 <- mtcars
mtcars_vs1["vs"] <- factor(1, levels=c(0,1))
pred_mtcars <- rbind(mtcars_vs0, mtcars_vs1)[c("vs", "disp", "weight")]

# Logistic Regression Model
glm_mpg20 <- glm(mpg20 ~ vs*disp, family = binomial(link = "logit"), data=mtcars, weights = mtcars$weight)

# Prediction on logit scale
preds <- predict(glm_mpg20, newdata=pred_mtcars,  type = "link", se.fit = TRUE)

#Calculate CIs for fitted values
critval <- 1.96 ## approx 95% CI
upr <- preds$fit + (critval * preds$se.fit)
lwr <- preds$fit - (critval * preds$se.fit)
fit <- preds$fit

# Transform from logit to propability scale
fit2 <- glm_mpg20$family$linkinv(fit)
upr2 <- glm_mpg20$family$linkinv(upr)
lwr2 <- glm_mpg20$family$linkinv(lwr)

# combine fitted values and CIs in one dataframe
fitted_mpg20 <- data.frame(propability=fit2,
                           lwr=lwr2,
                           upr=upr2)

# bind CIs and data for prediction
predicted_data <- cbind(pred_mtcars, fitted_mpg20)

# calculate mean propability for vs=0 and vs=1
mean_prop <- predicted %>% 
  group_by(vs) %>% 
  summarise(mean_propability = sum(propability*weight)/sum(weight))

Thank you very much for your help and let me know if you need something else from me

Stephan
  • 113
  • 14
  • If you are familiar with bootstrapping that would be a reasonable way to generate a confidence interval for your estimate in my opinion. – TrainingPizza Feb 16 '22 at 18:32
  • I know what bootstrapping is, but have never done it manually andam not sure which appraoch to use (percentiles or other methods) and as Stata seems to have implemnetd a way calculating what I nedd without bootstrapping (thopugh using the Delta Method for Confidence Intevals) I thought there might be a more "direct approach" – Stephan Feb 21 '22 at 08:40
  • I believe this might be what you’re looking for: https://cran.r-project.org/web/packages/modmarg/vignettes/delta-method.html – TrainingPizza Feb 21 '22 at 17:48

1 Answers1

2

You can use the predictions() function from the marginaleffects() package (disclaimer: I am the author). Here is a minimal example:

library(marginaleffects)

mod <- glm(am ~ hp + mpg, data = mtcars, family = binomial)

predictions(mod, newdata = datagrid())
##   rowid     type predicted std.error conf.low conf.high       hp      mpg
## 1     1 response 0.4441406 0.1413224 0.206472 0.7104511 146.6875 20.09062

By default, the datagrid() function will create a data frame with each variable set to its mean, but you can pick arbitrary values:

predictions(mod, newdata = datagrid(mpg = 30, hp = c(100, 120)))
##   rowid     type predicted    std.error  conf.low conf.high mpg  hp
## 1     1 response 0.9999380 0.0002845696 0.6674229         1  30 100
## 2     2 response 0.9999794 0.0001046887 0.6992366         1  30 120

Or feed a data frame:

nd <- data.frame(mpg = 30, hp = c(100, 140))
predictions(mod, newdata = nd)
##   rowid     type predicted    std.error  conf.low conf.high mpg  hp
## 1     1 response 0.9999380 0.0002845696 0.6674229         1  30 100
## 2     2 response 0.9999931 0.0000382223 0.7255411         1  30 140
Vincent
  • 15,809
  • 7
  • 37
  • 39
  • Dear Vincent thank you very much for your answer. What I was looking for are the average adjusted predictions. When I last saw your package I thought it can "only" calculate marginal effects. However I still have the issue of getting the right confidence intervals. You suggest summarizing over the std errors in the vignette. So is the right approach to summarize the standard errors on the propability scale and then transform to the loigt scale and calculate the confident intervals and then transform back again? – Stephan Feb 21 '22 at 08:35
  • 1
    Sorry I misunderstood the question. You are right, this is not currently supported. If you have ideas about the specific statistical procedure that should be applied when computing standard errors for Average Adjusted Predictions, feel free to post here: https://github.com/vincentarelbundock/marginaleffects/issues/216 – Vincent Feb 21 '22 at 12:07
  • It seems you closed the issue but I know found this function in the new release: `avg_predictions()` doesn't this solve my issue or do I understand something wrong? – Stephan Feb 20 '23 at 15:07
  • 1
    Yes @stephan, I believe that `avg_predictions()` is what you need. Please note that version 0.10.0 should be released in the coming days, and that the website anticipates this change. For the next few days, you might have to install from Github for all commands on the website to work. W.r.t. your particular problem, see this section of the vignette: https://vincentarelbundock.github.io/marginaleffects/articles/predictions.html#average-adjusted-predictions-aap – Vincent Feb 20 '23 at 19:17