i have the following df:
df <- data.frame(
Company = rep(LETTERS[1:10], each = 10),
Year = rep(2010:2019, times = 10),
Var1 = runif(100, -1, 1),
Var2 = runif(100, -1, 1),
Var3 = runif(100, -1, 1),
Var4 = runif(100, -1, 1),
Var5 = runif(100, -1, 1)
)
I am trying to run 10 ols regressions per year and adding all the coefficients, standard errors, t-statistic and p value into one big new dataframe. I came across this link and tried the following: Running several linear regressions from a single dataframe in R
coef_matrix <- df %>% group_by(Year) %>% do(tidy(lm(Var1 ~ Var2 + Var3 + Var4 + Var5, data=df)))
However, this gives me this output:
head(coef_matrix,15)
# A tibble: 15 x 6
# Groups: Year [3]
Year term estimate std.error statistic p.value
<int> <chr> <dbl> <dbl> <dbl> <dbl>
1 2010 (Intercept) 0.0274 0.0591 0.463 0.644
2 2010 Var2 -0.0339 0.113 -0.301 0.764
3 2010 Var3 -0.0705 0.112 -0.628 0.532
4 2010 Var4 0.0663 0.0973 0.682 0.497
5 2010 Var5 0.201 0.0983 2.04 0.0440
6 2011 (Intercept) 0.0274 0.0591 0.463 0.644
7 2011 Var2 -0.0339 0.113 -0.301 0.764
8 2011 Var3 -0.0705 0.112 -0.628 0.532
9 2011 Var4 0.0663 0.0973 0.682 0.497
10 2011 Var5 0.201 0.0983 2.04 0.0440
11 2012 (Intercept) 0.0274 0.0591 0.463 0.644
12 2012 Var2 -0.0339 0.113 -0.301 0.764
13 2012 Var3 -0.0705 0.112 -0.628 0.532
14 2012 Var4 0.0663 0.0973 0.682 0.497
15 2012 Var5 0.201 0.0983 2.04 0.0440
>
I want to try and run 10 annual regressions for every year and then store all their coefficients, standard errors etcetera into a dataframe. Right now the coefficients are the same as if i would run a normal ols regression for all years together:
summary(lm(Var1 ~ Var2 + Var3 + Var4 + Var5, data=df))
Call:
lm(formula = Var1 ~ Var2 + Var3 + Var4 + Var5, data = df)
Residuals:
Min 1Q Median 3Q Max
-1.10859 -0.51344 -0.05597 0.49481 1.13573
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02736 0.05909 0.463 0.644
Var2 -0.03392 0.11252 -0.301 0.764
Var3 -0.07045 0.11223 -0.628 0.532
Var4 0.06633 0.09730 0.682 0.497
Var5 0.20064 0.09828 2.041 0.044 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5851 on 95 degrees of freedom
Multiple R-squared: 0.05331, Adjusted R-squared: 0.01345
F-statistic: 1.338 on 4 and 95 DF, p-value: 0.2617
How do i change this? I thought the group_by function was supposed to run the annual regression, but it seems like i am making a mistake. So for every year, the coefficients, standard errors etcetera should be different (of course). So for 2010, the output should be this in the newly created df:
summary(lm(Var1 ~ Var2 + Var3 + Var4 + Var5, data=df[df$Year=="2010", ]))
Call:
lm(formula = Var1 ~ Var2 + Var3 + Var4 + Var5, data = df[df$Year ==
"2010", ])
Residuals:
1 11 21 31 41 51 61 71 81 91
-0.04871 -0.08103 -0.79737 0.10159 0.18243 0.41049 -0.05951 -0.12389 0.18528 0.23072
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.3293 0.2223 -1.481 0.1986
Var2 0.8657 0.3569 2.426 0.0597 .
Var3 0.5305 0.4735 1.120 0.3134
Var4 -0.2685 0.2446 -1.097 0.3225
Var5 -0.1988 0.3729 -0.533 0.6167
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4389 on 5 degrees of freedom
Multiple R-squared: 0.612, Adjusted R-squared: 0.3017
F-statistic: 1.972 on 4 and 5 DF, p-value: 0.2372
And this for 2011, and so on:
Call:
lm(formula = Var1 ~ Var2 + Var3 + Var4 + Var5, data = df[df$Year ==
"2011", ])
Residuals:
2 12 22 32 42 52 62 72 82 92
-0.541020 -0.626232 0.627003 0.326570 0.450827 -0.505888 -0.008373 0.264653 0.190374 -0.177914
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.23500 0.27901 0.842 0.438
Var2 -0.49973 0.59313 -0.843 0.438
Var3 0.21137 0.55453 0.381 0.719
Var4 0.34993 0.40960 0.854 0.432
Var5 0.09965 0.37772 0.264 0.802
Residual standard error: 0.597 on 5 degrees of freedom
Multiple R-squared: 0.2734, Adjusted R-squared: -0.3079
F-statistic: 0.4703 on 4 and 5 DF, p-value: 0.7577