Its really important to follow the guidelines when asking a question. Nonetheless, I've made a toy example with the iris
dataset.
In order to run the same regressions multiple times over different parts of your dataset, you can use the lapply()
function, which applies a function over a vector or list (in this case, the name of the species). The only thing you have to do is pass this to the subset
argument in the lm()
function:
data("iris")
species <- unique(iris$Species)
species
Running species
shows the levels of this variable:
[1] setosa versicolor virginica
Levels: setosa versicolor virginica
And running colnames(iris)
tells us what variables to use:
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
The lapply
function can be run thereafter like so:
models <- lapply(species, function(x) {
lm(Petal.Length ~ Petal.Width + Sepal.Length + Sepal.Width,
data = iris, subset = iris$Species == x)
})
lapply(models, summary)
The result:
[[1]]
Call:
lm(formula = Petal.Length ~ Petal.Width + Sepal.Length + Sepal.Width,
data = iris, subset = iris$Species == x)
Residuals:
Min 1Q Median 3Q Max
-0.38868 -0.07905 0.00632 0.10095 0.48238
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.86547 0.34331 2.521 0.0152 *
Petal.Width 0.46253 0.23410 1.976 0.0542 .
Sepal.Length 0.11606 0.10162 1.142 0.2594
Sepal.Width -0.02865 0.09334 -0.307 0.7602
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1657 on 46 degrees of freedom
Multiple R-squared: 0.1449, Adjusted R-squared: 0.08914
F-statistic: 2.598 on 3 and 46 DF, p-value: 0.06356
[[2]]
Call:
lm(formula = Petal.Length ~ Petal.Width + Sepal.Length + Sepal.Width,
data = iris, subset = iris$Species == x)
Residuals:
Min 1Q Median 3Q Max
-0.61706 -0.13086 -0.02966 0.09854 0.54311
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.16506 0.40032 0.412 0.682
Petal.Width 1.36021 0.23569 5.771 6.37e-07 ***
Sepal.Length 0.43586 0.07938 5.491 1.67e-06 ***
Sepal.Width -0.10685 0.14625 -0.731 0.469
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2319 on 46 degrees of freedom
Multiple R-squared: 0.7713, Adjusted R-squared: 0.7564
F-statistic: 51.72 on 3 and 46 DF, p-value: 8.885e-15
[[3]]
Call:
lm(formula = Petal.Length ~ Petal.Width + Sepal.Length + Sepal.Width,
data = iris, subset = iris$Species == x)
Residuals:
Min 1Q Median 3Q Max
-0.7325 -0.1493 0.0516 0.1555 0.5866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.46503 0.47686 0.975 0.335
Petal.Width 0.21565 0.17410 1.239 0.222
Sepal.Length 0.74297 0.07129 10.422 1.07e-13 ***
Sepal.Width -0.08225 0.15999 -0.514 0.610
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2819 on 46 degrees of freedom
Multiple R-squared: 0.7551, Adjusted R-squared: 0.7391
F-statistic: 47.28 on 3 and 46 DF, p-value: 4.257e-14
BTW, you are not performing any stepwise regression in your code. But the above example can be easily modified to do so.
Hope this helps.