3

I am using the feols command from the fixest package in R to estimate fixed effect regressions, in most cases with weights. In order to output my results, I am mainly using the modelsummary command from the modelsummary package. I have figured out how to use the add_rows feature to add further information about the models (in my cases, I'm using it to add indicators for the presence of control variables in a given model, in a similar fashion to how modelsummary can natively indicate the presence of fixed effects in a given model).

However, I would also like to add, to each model, the mean of the dependent variable of the model, estimated on the same observations as the given model is estimated (i.e. accounting for that some observations drop because of missing values, and that this can differ from model to model depending on the structure of missing values), and preferably only for observations with my treatment variable set to == 0. How could I achieve this?

The basic regression structure I have is as follows (with some specifications being simpler, e.g. without the Municipality.ID):

model = feols(depvar ~ i(Year.factor, Treatment.dummy, ref ='2007') + Year.factor + Treatment.dummy + Control.var, data = subset(data.frame, condition < limit), weights = Weight.var, panel.id = c(Year.factor, Municipality.ID), cluster = ~Municipality.ID)

I loop these estimations to the list of models mdls. The modelsummary output code is more or less as follows:

mdl.stats <- list(
  list("raw" = "nobs", "clean" = "Observations", "fmt" = 0),
  list("raw" = "r.squared", "clean" = "R Squared", "fmt" = 3),
  list("raw" = "r2", "clean" = "R Squared", "fmt" = 3),
  list("raw" = "r2.within", "clean" = "R2", "fmt" = 3),
  list("raw" = "within.r.squared", "clean" = "R2 (within)", "fmt" = 3),
  list("raw" = "r.squared.within", "clean" = "R2 (within)", "fmt" = 3),
  list("raw" = "FE: TERYT.f", "clean" = "Municipality FE", "fmt" = 0))
cntrls <- rep(c("", "X", ""), times=length(mdls)/3)
cntrls <- as.data.frame(t(c("Controls", cntrls)))
cntrls <- set_names(cntrls, c("Coefficients", names(mdls)))

modelsummary::modelsummary(mdls, gof_map = mdl.stats, stars = TRUE, output = "latex", booktabs = TRUE, add_rows = cntrls)
Quinten
  • 35,235
  • 5
  • 20
  • 53
artur
  • 33
  • 5
  • Hi @artur, could you please share your data using `dput(df)`? So we can help you better. – Quinten Apr 15 '22 at 10:34
  • Hi @Quinten, good point, thank you. The answer below by @Vincent works very well in terms of the example, using the `mtcars` example data. The thing that's remaining (as I reply to that answer too) is to extract the data used in the `feols` models, to be able to compute the relevant means on those data subsets. Thank you! – artur Apr 15 '22 at 12:27

2 Answers2

4

Sorry I cannot comment on Vincent's answer. Fixest provides the "mean of dependent variable" fitstat under "my", such that your glance_custom function can be simplified to:

glance_custom.fixest <- function(x, ...) {
  out <- data.frame("Mean(DV)" = as.numeric(fitstat(x, type = "my")))
  return(out)
}

and there is no need to depend on insight.

2

You can of course create your own rows manually and add them to the table with the add_rows argument. This just requires a bit of creativity with base R functions in order to create the appropriate data frame.

If you are looking for a more “automated” strategy, the Customizing Existing Models section of the modelsummary website describes a very powerful mechanism to add new information to the bottom of the table. In a nutshell: create a function called glance_custom.fixest which takes a single model object as input and returns a data.frame with only one row and each of the new statistics in different columns.

In this example, I use the find_response() and get_data() functions from the insight package to make it easier to find the dependent variables, extract the data, and compute the mean:

library(modelsummary)
library(fixest)
library(insight)

glance_custom.fixest <- function(x, ...) {
    dv <- insight::find_response(x)
    dat <- insight::get_data(x)
    out <- data.frame("Mean(DV)" = mean(dat[[dv]]), check.names = FALSE)
    return(out)
}

mod <- list(
    feols(mpg ~ hp, data = mtcars[mtcars$cyl == 4,]),
    feols(mpg ~ hp, data = mtcars[mtcars$cyl == 6,]),
    feols(mpg ~ hp, data = mtcars[mtcars$cyl == 8,]))

modelsummary(mod, output = "markdown")
Model 1 Model 2 Model 3
(Intercept) 35.983 20.674 18.080
(5.201) (3.304) (2.988)
hp -0.113 -0.008 -0.014
(0.061) (0.027) (0.014)
Num.Obs. 11 7 14
R2 0.274 0.016 0.080
R2 Adj. 0.193 -0.181 0.004
AIC 63.8 27.9 67.8
BIC 64.6 27.8 69.1
Log.Lik. -29.891 -11.954 -31.920
Std.Errors IID IID IID
Mean(DV) 26.664 19.743 15.100
Vincent
  • 15,809
  • 7
  • 37
  • 39
  • Thank you very much! This is incredibly helpful. In order to do what I want, I need to overcome one more obstacle, though: the `get_data()` method from the `insight` package doesn't seem to work for `feols` output, and I have so far not found another way to extract the data used in `feols` models. Any ideas? – artur Apr 15 '22 at 12:24
  • Try to update your `insight` package, restart `R` completely, and try again. In principle, it *should* work with `feols` models -- as I have demonstrated in my example here, which uses `feols`. I know there have been updates to `insight` recently, so that could help. Otherwise, maybe call `model.frame(x)` instead of `insight::get_data`. – Vincent Apr 15 '22 at 12:27
  • @Vincent, is there any way to customize this so that the Mean(DV) is the first gof statistic in the table? – mikeytop Aug 31 '22 at 07:31
  • Moreover, is there a way to change Mean(DV) label to "Mean of Dependent Variable" without getting strange periods instead of whitespaces? – mikeytop Aug 31 '22 at 07:38
  • Please read the documentation for the `gof_map` argument. There are also examples of its use on the package website. – Vincent Aug 31 '22 at 10:49
  • 1
    You can use backticks when naming the variable in the `glance_custom.fixest` – Vincent Aug 31 '22 at 10:49