I am trying to manually generate lines of a ggplot
to display confidence intervals. I am extracting data from multiple different models external to the ggplot
, and then plotting the results within the ggplot manually. However, I have a great number of models and specific variables to select, so I would like to automate this process.
How can I use the lapply
function (or another) to do so. For example, if I wanted to plot this:
Suppose variable1, variable2, variable3
are all from different models.
This would be the rough code to plot one confidence interval (for variable1)
plot <- ggplot(data=NULL, aes(x= c("variable1","variable2", "variable3"), y=c(-1,1))) + labs(y="Estimate", x="Model") + geom_segment(aes(x="variable1",xend="variable1",y= (Betahat_variable1-1.96*se_variable1), yend=(1.96*se_variable1+Betahat_variable1))) + coord_flip()
How can I use lapply
(or another syntax) to repeat this for variable1, variable2, variable3, etc... without having to manually write out each betahat/se? I can extract those variables from the models using the same format (because the models are similar), but how can I insert these character strings within the ggplot
code, rather than writing the digits out themselves? This would allow me to draw the plot much more efficiently.
Dummy data:
This is how I extract the metrics to build the confidence intervals. All the models are felm
regressions with fixed effects and multiple controls.
The listed felm regressions are compiled within the object models
, from which I extract variable1 (models[[1]]
), variable2 (models[[2]]
), etc.
df:
Variable Group Outcome1 Outcome2 Outcome3
1 1 2 4 0
2 4 5 6 0
3 2 3 2 4
4 1 1 6 1
models <- paste0("outcome", 1:10, " ~ variable | group|0|group") |> lapply(\(x) felm(as.formula(x), data = df))
#variable1 coefficient
models[[1]]$coefficients[1]
#variable 1 p value
models[[1]]$pval
Betahat_variable1 <- coef(models[[1]])[1]
se_variable1 <- models[[1]]$se[1]
I unfortunately cannot provide original data, but all of the data extracted from the felm models are numerical, between -1, and 1, so the extraction process is similar across all of the different models/variables.