4

I have question regarding the lm() function in R. I understand that lm() is used for regression modeling and I know that one can do this:

lm(response ~ explanatory1 + explanatory2 + ... + explanatoryN, data = dataset)

Now my question is: "Suppose that N is large, is there a short cut that I can use that doesn't involve me having to write all N variable names?"

Thanks in advance!

Edit: I left out a big part of the question that I really needed an answer to. Suppose that I wanted to remove 1 to k explanatory variables and only include n-k of those variables.

Try Khov
  • 87
  • 1
  • 8
  • Yeah: https://stackoverflow.com/questions/5251507/how-to-succinctly-write-a-formula-with-many-variables-from-a-data-frame – Ryan Morton Apr 19 '18 at 22:19
  • Are there explanatoryy variables in dataset that are not needed? – Onyambu Apr 19 '18 at 22:20
  • If you have large numbers of variables with many included and many excluded, you can try to take advantage of regularities in column names. For example, `lm(reformulate(paste0("explanatory", 1:5), "response"), data=dataset)`. – eipi10 Apr 19 '18 at 22:30
  • @Onyambu my apologies. Yes! I edited my question! – Try Khov Apr 19 '18 at 22:31

4 Answers4

4

You could use the dot sign to select all variables, and just use the minus sign to select those that should not be used as predictors.

lm(Sepal.Length ~ .-Species -Petal.Length, iris)

Call:
lm(formula = Sepal.Length ~ . - Species - Petal.Length, data = iris)

Coefficients:
(Intercept)  Sepal.Width  Petal.Width  
 3.4573       0.3991       0.9721  
Lennyy
  • 5,932
  • 2
  • 10
  • 23
4

Assuming mtcars as an example:

I would capture the predictors. I stick to a basic example, but one could use regex with grep and keep the same logic (see below). I am using all the columns with the exception of the first one ("mpg").

predictors <- names(mtcars)[-1] 
# [1] "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"

myFormula <- paste("mpg ~ ", paste0(predictors, collapse = " + "))
# [1] "mpg ~  cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb"

lm(data = mtcars, formula = myFormula)

Regex Example

Assuming iris as an example. I would like to match all the column names containing "Petal".

predictors <- grep(x = names(iris), pattern = "Petal", value = TRUE)
#[1] "Petal.Length" "Petal.Width" 

myFormula <- paste("Sepal.Width ~ ", paste0(predictors, collapse = " + "))
# [1] "Sepal.Width ~  Petal.Length + Petal.Width"

lm(data = iris, formula = myFormula)
Pasqui
  • 591
  • 4
  • 12
1

You can use the .

lm(response~., data = data)

Conor Neilson
  • 1,026
  • 1
  • 11
  • 27
1

You can just use a dot lm(response ~ ., data= dataset)

Example using the mtcars dataset (already in R)

ex = lm(mpg~., data = mtcars)
summary (ex)
Felipe Dalla Lana
  • 615
  • 1
  • 5
  • 12