Running a regression

Question

Background: my data set has 52 rows and 12 columns (assume column names are A - L) and the name of my data set is foo

I am told to run a regression where foo$L is the dependent variable, and all other variables are independent except for foo$K.

The way i was doing it is

fit <- lm(foo$L ~ foo$a + ... +foo$J)

then calling

summary(fit)

Is my way a good way to run a regression and finding the intercept and coef?

You should [make your example reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). That said, you shouldn't subset within formula notation—use the `data` parameter, it's done automatically: `lm(mpg ~ wt + hp, mtcars)` — alistaire, Jun 04 '18 at 00:12
Agreeing with @alistaire but using your example you should use something like `lm(K ~ ., data=foo[,-11])` — G5W, Jun 04 '18 at 00:13

score 3 · Answer 1 · answered Jun 04 '18 at 00:22

Use the data argument to lm so you don't have to use the foo$ syntax for each predictor. Use dependent ~ . as the formula to have the dependent variable predicted by all other variables. Then you can use - K to exclude K:

data_mat = matrix(rnorm(52 * 12), nrow = 52)

df = as.data.frame(data_mat)
colnames(df) = LETTERS[1:12]

lm(L ~ . - K, data = df)

score 0 · Answer 2 · answered Jun 04 '18 at 00:18

You can first remove the column K, and then do fit <- lm(L ~ ., data = foo). This will treat the L column as the dependent variable and all the other columns as the independent variables. You don't have to specify each column names in the formula.

Here is an example using the mtcars, fitting a multiple regression model to mpg with all the other variables except carb.

mtcars2 <- mtcars[, !names(mtcars) %in% "carb"]

fit <- lm(mpg ~ ., data = mtcars2)

summary(fit)

# Call:
#   lm(formula = mpg ~ ., data = mtcars2)
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -3.3038 -1.6964 -0.1796  1.1802  4.7245 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)   
# (Intercept) 12.83084   18.18671   0.706  0.48790   
# cyl         -0.16881    0.99544  -0.170  0.86689   
# disp         0.01623    0.01290   1.259  0.22137   
# hp          -0.02424    0.01811  -1.339  0.19428   
# drat         0.70590    1.56553   0.451  0.65647   
# wt          -4.03214    1.33252  -3.026  0.00621 **
# qsec         0.86829    0.68874   1.261  0.22063   
# vs           0.36470    2.05009   0.178  0.86043   
# am           2.55093    2.00826   1.270  0.21728   
# gear         0.50294    1.32287   0.380  0.70745   
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 2.593 on 22 degrees of freedom
# Multiple R-squared:  0.8687,  Adjusted R-squared:  0.8149 
# F-statistic: 16.17 on 9 and 22 DF,  p-value: 9.244e-08

Running a regression

2 Answers2