How to use the * operator in lm() in R when the independent variable is a matrix

Question

I'm fitting several multi-variable linear models using lm()

Basically matrix1 holds the dependent variables (y) and matrix2 the independent ones (x)

model.1<-lm(matrix1[, 1] ~ matrix2)

Where matrix2 has a variable number of columns depending on the specific combination of parameters I want in the regression, no zero-value columns in matrix2.

This statement works fine for a lineal model with no interaction between independent variables (IV), (a model like this: a0 + a1*x1 + a2*x2 ...), but if I want to introduce interaction between the IV manual indicates to use the operator * between the variables (model.1 <- lm(matrix1[, 1] ~ x1 * x2 * x3)). How can I apply this when the IV are in a matrix?

Welcome to R on StackOverflow. Please read (1) [how do I ask a good question](http://stackoverflow.com/help/how-to-ask), (2) [How to create a MCVE](http://stackoverflow.com/help/mcve) as well as (3) [how to provide a minimal reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). Then I suggest you edit and improve your question accordingly. I.e., provide some input data (maybe with one of the many example datasets, which are included in R), use the SO formatting options (format code as code) etc. — lukeA, Apr 27 '16 at 21:53

G. Grothendieck · Answer 1 · 2016-04-28T10:45:57.713

1) SO questions are supposed to provide the test data reproducibly but here we have done it for you using the builtin data.frame anscombe. After defining the test data we define a data frame containing the columns we want and the appropriate formula. Finally we call lm:

# test data
matrix1 <- as.matrix(anscombe[5:8])
matrix2 <- as.matrix(anscombe[1:4])

DF <- data.frame(matrix1[, 1, drop = FALSE], matrix2) # cols are y1, x1, x2, x3, x4
fo <- sprintf("%s ~ (.)^%d", colnames(matrix1)[1], ncol(matrix2))  # "y1 ~ (.)^4"

lm(fo, DF)

giving:

Call:
lm(formula = fo, data = DF)

Coefficients:
(Intercept)           x1           x2           x3           x4        x1:x2  
    12.8199      -2.6037           NA           NA      -0.1626       0.3628  
      x1:x3        x1:x4        x2:x3        x2:x4        x3:x4     x1:x2:x3  
         NA           NA           NA           NA           NA      -0.0134  
   x1:x2:x4     x1:x3:x4     x2:x3:x4  x1:x2:x3:x4  
         NA           NA           NA           NA

2) A variation of this which gives a slightly nicer result in the Call: part of the lm output is the following. We use DF from above. do.call will pass the contents of the fo variable rather than its name so that we see the formula in the Call: part of the output. On the other hand, quote(DF) is used to force the name DF to display rather than the contents of the data.frame.

lhs <- colnames(matrix1)[1]
rhs <- paste(colnames(matrix2), collapse = "*")
fo <- paste(lhs, rhs, sep = "~")  # "y1~x1*x2*x3*x4"
do.call("lm", list(fo, quote(DF)))

giving:

Call:
lm(formula = "y1 ~ x1*x2*x3*x4", data = DF)

Coefficients:
(Intercept)           x1           x2           x3           x4        x1:x2  
    12.8199      -2.6037           NA           NA      -0.1626       0.3628  
      x1:x3        x2:x3        x1:x4        x2:x4        x3:x4     x1:x2:x3  
         NA           NA           NA           NA           NA      -0.0134  
   x1:x2:x4     x1:x3:x4     x2:x3:x4  x1:x2:x3:x4  
         NA           NA           NA           NA

How to use the * operator in lm() in R when the independent variable is a matrix

1 Answers1