0

I currently have a difference in difference model and would like to measure different metrics for the same model in an efficient manner.

For example, I have a data frame with columns for Miles Driven, Hours Worked, State, Group, Time.

Currently, have code where I copy and paste the model for each metric:

# Create DID models
model_miles <- lm(df$miles_driven ~ df$Group 
                        + df$Time 
                        + df$Group * df$Time, data = df)
model_hours <- lm(df$hours_worked ~ df$Group 
           + df$Time 
           + df$Group * df$Time, data = df)

# Select results using clustered standard errors. The purpose of this is to 
# avoid making distributional assumptions about the errors in the models. 
results_miles <- clubSandwich::coef_test(model_miles, 
                                            vcov = "CR2", 
                                            cluster = df$state, 
                                            test = "Satterthwaite")
results_hours <- clubSandwich::coef_test(model_hours, 
                               vcov = "CR2", 
                               cluster = df$state, 
                               test = "Satterthwaite")

results <- data.table::rbindlist(list(results_miles, results_hours))
View(results)

I would like to somehow create a list of my metric names, and loop over this list using a user defined function, in order to make this process faster and more automated, but I haven't been able to get this to work correctly:

#list of metrics
metrics <- c("miles_driven", "hours_worked")

udf <- function(metric, dataframe){
     # Create DID model
     model <- lm(dataframe$metric ~ df$Group 
                + dataframe$Time 
                + dataframe$Group * df$Time, data = dataframe)

     # Select results using clustered standard errors. The purpose of this 
     is to 
     # avoid making distributional assumptions about the errors in the 
     models. 
     results_miles <- clubSandwich::coef_test(model_miles, 
                                       vcov = "CR2", 
                                       cluster = dataframe$state, 
                                       test = "Satterthwaite")[4,]
     View(results)
}

lapply(metrics, udf)

Any insight would be appreciated. Thanks!

1 Answers1

0

This sounds like a job for map() from purrr, along with tidyr's unnest()!

Next time, it would be helpful to build a reproducible example, but here is one we can start with.

library(tidyverse)

train <- data_frame(metric = c("mpg",
                               "disp"),
                    formula = list(as.formula("mpg ~ ."), 
                                   as.formula("disp ~ ."))) %>%
    mutate(model = map(formula, ~lm(.x, data = mtcars)))

train
#> # A tibble: 2 x 3
#>   metric formula       model   
#>   <chr>  <list>        <list>  
#> 1 mpg    <S3: formula> <S3: lm>
#> 2 disp   <S3: formula> <S3: lm>

Notice here that I set up a column with the name of the metrics we are going to use as the predicted quantities in the models, and then columns that contain formulas for modeling. You could get even fancier with paste() there and mutate() and not retype the names of the predicted quantities. Then use purrr's map() to fit a model for each of the predicted quantities. You have a column that contains models now.

Next, it's time to test all your regression coefficients. Notice how map() takes its arguments: first, the quantity you are mapping over (the models), then the function you are applying to each of those models (the function from clubSandwich), then lastly the other arguments that need to be passed to that function.

tests <- train %>%
    mutate(coeffs = map(model, clubSandwich::coef_test, "CR2", "Satterthwaite", "All", mtcars$cyl))


tests
#> # A tibble: 2 x 4
#>   metric formula       model    coeffs                           
#>   <chr>  <list>        <list>   <list>                           
#> 1 mpg    <S3: formula> <S3: lm> <coef_test_clubSandwich [11 × 4]>
#> 2 disp   <S3: formula> <S3: lm> <coef_test_clubSandwich [11 × 4]>

We did it! The last step is to use tidyr's unnest() so we can easily get to all those quantities from the regression coefficient tests.

tests %>%
    unnest(coeffs)
#> # A tibble: 22 x 5
#>    metric     beta      SE    df p_Satt
#>    <chr>     <dbl>   <dbl> <dbl>  <dbl>
#>  1 mpg     12.3    16.3     1.52  0.550
#>  2 mpg     -0.111   1.44    2.00  0.945
#>  3 mpg      0.0133  0.0159  1.34  0.526
#>  4 mpg     -0.0215  0.0330  1.82  0.588
#>  5 mpg      0.787   0.493   2.05  0.248
#>  6 mpg     -3.72    2.05    1.33  0.270
#>  7 mpg      0.821   0.388   1.34  0.228
#>  8 mpg      0.318   1.55    1.74  0.859
#>  9 mpg      2.52    1.25    1.73  0.202
#> 10 mpg      0.655   1.84    1.73  0.761
#> # ... with 12 more rows

Created on 2018-04-26 by the reprex package (v0.2.0).

Julia Silge
  • 10,848
  • 2
  • 40
  • 48