1

I'm trying to make different lm models using the same data, but show higher order interactions by changing the exponent. I'm trying to make this in a for loop rather than writing multiple lines. But I'm getting an "invalid power in formula" error.

for (m in 1:5){
  assign(paste("lm.", m, sep = ""), lm(paste("Response ~ (Factor1+Factor2+Factor3+Factor4+Factor5)^", m, sep = "")))
}

Q: What is causing this and how do I fix it?

Secondly, I would like to have the number of factors in the lm function (Factor1+Factor2+...) be pasted in, as the number of Factors changes.

I also have not been able to see if this works yet due to the above error.

I tried:

for (m in 1:n.factors){
  # Make a string to paste in lm()
  assign(paste("lm.paste.", m, sep = ""), paste("Response ~ (", paste(factors, collapse="+"), ")^", m, sep = ""))

  # Paste in string to lm()
  assign(paste("lm.", m, sep = ""), lm(lm.paste.1, data=data.df))
}

Where "factors" is a vector containing strings of my factors ("Factor1", "Factor2" ...). And n.factors is the number of factors.

Q: Is this the correct way of doing this? It feels a little cumbersome.

Maybe there is some way to use

lm(Response ~ ., data = data.df)

And subset the .?

Here is My Data Frame.

Edit:

dput(head(data.df, 20))

structure(list(Factor1 = c(-1, 1, -1, 1, -1, 1, -1, 1, -1, 1, 
-1, 1, -1, 1, -1, 1, -1, 1, -1, 1), Factor2 = c(-1, -1, 1, 1, 
-1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1), Factor3 = c(-1, 
-1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, 1, 1, -1, -1, -1, 
-1), Factor4 = c(-1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 
1, 1, 1, 1, -1, -1, -1, -1), Factor5 = c(-1, -1, -1, -1, -1, 
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1), Response = c(680.45, 
722.48, 702.14, 666.93, 703.67, 642.14, 692.98, 669.26, 491.58, 
475.52, 478.76, 568.23, 444.72, 410.37, 428.51, 491.47, 607.34, 
620.8, 610.55, 638.04), Order = c(17, 30, 14, 8, 32, 20, 26, 
24, 10, 16, 27, 18, 3, 19, 31, 15, 12, 1, 4, 23)), .Names = c("Factor1", 
"Factor2", "Factor3", "Factor4", "Factor5", "Response", "Order"
), row.names = c(NA, 20L), class = "data.frame")
  • 2
    Avoid using assign, read about `lapply`, keep outputs in a list. – zx8754 Sep 21 '17 at 10:07
  • 1
    Relevant/Possible duplicate post about running lm on all combinations: https://stackoverflow.com/questions/28606549/how-to-run-lm-models-using-all-possible-combinations-of-several-variables-and-a – zx8754 Sep 21 '17 at 10:11
  • 1
    You could look at the [many models section in R for Data Science](http://r4ds.had.co.nz/many-models.html) for fitting several models – FlorianGD Sep 21 '17 at 10:16
  • You should consider using `poly()` instead of `^`. – Axeman Sep 21 '17 at 12:07

1 Answers1

4

Your problem is simply that 1 isn't an acceptable value for powers in lm:

> lm(mpg ~ cyl^1, mtcars)
Error in terms.formula(formula, data = data) : invalid power in formula

If you change your iterator to use 2:5 then it will run:

> for (m in 2:5) {lm(paste0("mpg ~ cyl^",m), mtcars);print("ok")}
[1] "ok"
[1] "ok"
[1] "ok"
[1] "ok"
> for (m in 1:5) tryCatch({lm(paste0("mpg ~ cyl^",m), mtcars);print("ok")}, error = function(e) warning(e))
[1] "ok"
[1] "ok"
[1] "ok"
[1] "ok"
Warning message:
In terms.formula(formula, data = data) : invalid power in formula

You could write some code that returns an empty string if the value is 1, or "^m" if the value is not 1, and concat that onto the end of your formula if you must have the ^1 version in the loop.

To answer your second question, I would strongly recommend never using assign in this sort of work. Once your lm objects are in the global environment, it's very difficult to get them back to do more things with them. If you wanted to expand your m to 20, or if you wanted to generalise your code for any given m, it would be extremely difficult.

A better option is to store your lm objects in a list. You can do this very simply by changing your for loop into a call to lapply:

> results <- lapply(2:5, function(x) lm(paste0("mpg ~ cyl^",x), mtcars))
> str(results)
List of 4
 $ :List of 12
  ...snip...
 $ :List of 12
  ...snip... 
 $ :List of 12
  ...snip...
 $ :List of 12
  ...snip...

Now you can iterate through your list to do "something" to every model all at once:

> purrr::map(results, function(x) x$coefficients)
[[1]]
(Intercept)         cyl 
   37.88458    -2.87579 

[[2]]
(Intercept)         cyl 
   37.88458    -2.87579 

[[3]]
(Intercept)         cyl 
   37.88458    -2.87579 

[[4]]
(Intercept)         cyl 
   37.88458    -2.87579 

and this is generalisable to any function you want to apply to a list of lm objects, for any m, or indeed for any list of lm objects or even any list of any kind of object, for example:

> purrr::map(results, broom::tidy)
[[1]]
         term estimate std.error statistic      p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2         cyl -2.87579 0.3224089 -8.919699 6.112687e-10

[[2]]
         term estimate std.error statistic      p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2         cyl -2.87579 0.3224089 -8.919699 6.112687e-10

[[3]]
         term estimate std.error statistic      p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2         cyl -2.87579 0.3224089 -8.919699 6.112687e-10

[[4]]
         term estimate std.error statistic      p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2         cyl -2.87579 0.3224089 -8.919699 6.112687e-10

Now you have a list of dataframes, where each data frame contains some interesting information about the model. You could use purrr::map to, say, check the p-values in each data frame in turn and do things with them, or create 4 plots of the fitted regression line, or do whatever else you fancy.

Mike Stanley
  • 1,420
  • 11
  • 13
  • Thank you for your clear explanation! I have gotten around the first problem by adding a if() statement, where if(m ==1) will make a lm model without the exponent. Now I will look at the second part. Thanks again! – Eggywontgrow Sep 21 '17 at 11:06