Here is a base R approach using the iris
data set also:
data(iris)
str(iris)
# 'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Since there are 150 rows, we compute the number of rolling groups of 20 and use that to create a matrix with 20 rows and 131 columns listing the row numbers to be used in each regression:
rows <- nrow(iris)
last <- rows + 1 - 20
idx <- sapply(1:last, seq, length.out=20)
str(idx)
# num [1:20, 1:131] 1 2 3 4 5 6 7 8 9 10 ...
So we have 131 columns and each column identifies a group of 20 rows for a regression. Now compute the 131 regressions and save the coefficients:
results <- lapply(1:131, function(x) lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, iris[idx[, x], ]))
coeffs <- t(sapply(results, coef))
head(coeffs)
# (Intercept) Sepal.Width Petal.Length Petal.Width
# [1,] 0.88165253 1.1027541 0.4335847 -1.3039612
# [2,] 0.64094220 1.1111668 0.6186075 -1.4860753
# [3,] 0.28030724 1.2120241 0.6477881 -1.7022181
# [4,] -0.01943516 1.1879500 0.8971728 -1.6773764
# [5,] 0.46106345 0.9888293 0.9230228 -0.8457783
# [6,] 0.92206667 0.9734378 0.5716684 -0.5058189
Each regression is stored as a list in results
so that the first regression is results[[1]]
.
summary(results[[1]])
#
# Call:
# lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width,
# data = iris[idx[, x], ])
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.26396 -0.17137 -0.00562 0.13582 0.36386
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.8817 0.6730 1.310 0.209
# Sepal.Width 1.1028 0.1748 6.309 1.04e-05 ***
# Petal.Length 0.4336 0.3448 1.257 0.227
# Petal.Width -1.3040 0.7924 -1.646 0.119
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.204 on 16 degrees of freedom
# Multiple R-squared: 0.8078, Adjusted R-squared: 0.7717
# F-statistic: 22.41 on 3 and 16 DF, p-value: 5.666e-06
Getting a statistic computed by summary
is slightly more involved:
Rsq <- sapply(results, function(x) summary(x)$adj.r.squared)
# quantile(Rsq)
# 0% 25% 50% 75% 100%
# 0.1635166 0.4471409 0.6298927 0.8417655 0.9278258