0

I have a numerous number of predictors in my dataset. And I want to perform simple linear regression to each of the predictor. So I did a loop. My code are as follow:

m = ncol(finalmev)
predictorlist = colnames(finalmev)[2:m]

for (i in predictorlist){
    model <- summary(lm(paste("ODR ~", i[[1]]), data=finalmev))
}

However, after I run the loop, I received Error as below:

> for (i in predictorlist){
+     model <- summary(lm(paste("ODR ~", i[[1]]), data=finalmev))
+ } Error in str2lang(x) : <text>:1:25: unexpected numeric constant 1: ODR ~ Overnight.Deposit 1

What does this error means? Is there anything wrong my my code or with my data?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
fhaney
  • 87
  • 6
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. What exactly is in `predictorlist`. It looks like you might have invalid variable names. You get the same error with `lm("mpg ~ wt 1", mtcars)` – MrFlick Jul 02 '21 at 05:24
  • If you are looking to do model selection with different combination of predictors, can I suggest usage of the `leaps` package? The function `regsubsets` is particularly useful for generating and evaluating all types of linear regression, especially if you coupled it with exhaustive and really.big options. – Adam Quek Jul 02 '21 at 05:53
  • 2
    Don't use a loop. Use nlme::lmList. – Roland Jul 02 '21 at 06:18

2 Answers2

1

The current code overwrites model at every iteration. You may want to create a list to store them.

predictorlist = colnames(finalmev)[-1]
model_list <- vector('list', length(predictorlist))

for (i in seq_along(predictorlist)) {
  model_list[[i]] <- summary(lm(paste("ODR ~", predictorlist[i]), data=finalmev))
}

Or use lapply -

result <- lapply(predictorlist, function(x) summary(lm(paste("ODR ~", x), data=finalmev))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

It seems like you have a column with space in its name. Thus, you need quotes as I show below:

# create a data set
set.seed(1)
finalmev <- data.frame(ODR = 1:4, 
                       `Overnight.Deposit 1` = rnorm(4),
                       `Overnight.Deposit 2` = rnorm(4), 
                       check.names = FALSE)

# reproduce the error
predictorlist <- colnames(finalmev)[2:NCOL(finalmev)]

for (i in predictorlist){
  model <- summary(lm(paste("ODR ~", i[[1]]), data=finalmev))
}
#R> Error in str2lang(x) : <text>:1:25: unexpected numeric constant
#R> 1: ODR ~ Overnight.Deposit 1
#R>                             ^

# fix the error using quotes
for (i in predictorlist)
  model <- summary(lm(sprintf("ODR ~ `%s`", i[[1]]), data=finalmev))

# actually save all the output as pointed out by Ronak Shah
res <- lapply(
  tail(colnames(finalmev), -1), 
  function(x) eval(bquote(summary(lm(.(sprintf("ODR ~ `%s`", x)), 
                                     data=finalmev)))))

# show the result
res
#R> [[1]]
#R> 
#R> Call:
#R> lm(formula = "ODR ~ `Overnight.Deposit 1`", data = finalmev)
#R> 
#R> Residuals:
#R>       1       2       3       4 
#R> -0.9534 -0.5809  1.2087  0.3256 
#R> 
#R> Coefficients:
#R>                       Estimate Std. Error t value Pr(>|t|)  
#R> (Intercept)             2.4386     0.5950   4.098   0.0547 .
#R> `Overnight.Deposit 1`   0.7746     0.6213   1.247   0.3387  
#R> ---
#R> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#R> 
#R> Residual standard error: 1.186 on 2 degrees of freedom
#R> Multiple R-squared:  0.4374,    Adjusted R-squared:  0.156 
#R> F-statistic: 1.555 on 1 and 2 DF,  p-value: 0.3387
#R> 
#R> 
#R> [[2]]
#R> 
#R> Call:
#R> lm(formula = "ODR ~ `Overnight.Deposit 2`", data = finalmev)
#R> 
#R> Residuals:
#R>       1       2       3       4 
#R> -1.6293  0.3902  0.2308  1.0083 
#R> 
#R> Coefficients:
#R>                       Estimate Std. Error t value Pr(>|t|)  
#R> (Intercept)             2.3372     0.7282   3.209   0.0849 .
#R> `Overnight.Deposit 2`   0.8865     1.1645   0.761   0.5260  
#R> ---
#R> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#R> 
#R> Residual standard error: 1.392 on 2 degrees of freedom
#R> Multiple R-squared:  0.2247,    Adjusted R-squared:  -0.163 
#R> F-statistic: 0.5795 on 1 and 2 DF,  p-value: 0.526
#R> 

I use eval(bquote(...)) to get a nice output. Notice that you can change colnames(finalmev)[2:ncol(finalmev)] to tail(colnames(finalmev), -1). As remarked above, Ronak Shah shows that you actually only save the last output in your for loop.

Two other alternatives are:

# move out sprintf
res1 <- lapply(sprintf("ODR ~ `%s`", tail(colnames(finalmev), -1)), 
               function(frm) eval(bquote(summary(lm(.(frm), data = finalmev)))))

# in R 4.1.0 or greater
res2 <- tail(colnames(finalmev), -1) |> 
  sprintf(fmt = "ODR ~ `%s`") |> 
  lapply(\(frm) eval(bquote(summary(lm(.(frm), data = finalmev)))))

# we get the same 
all.equal(res, res2)
#R> [1] TRUE
all.equal(res1, res2)
#R> [1] TRUE