Iteration of columns for linear regression in R

Question

I try to select columns in order to make a linear regression.

I tried to make something like this but it does not seems to work

df <- 0
x <- 0
for(i in 1:30){
  reg.A_i <- lm(log(match("A", i, sep="_"))~  log(A_0) + B + C , data=y)
  x <- coef(summary(reg.A_i))
  df <- cbind(df[,1],x)
}

My data frame has variables like this:

A_0,  A_1, A_2, A_3 .... A_30, B, C

Three questions for you: (1) Do you know what `match` returns? (2) Do you know what `cbind(df[,1], x)` does? (3) Do you know what `for(i in 1:30) reg.A_i` does? — Señor O, Aug 08 '13 at 15:06
you need to use `as.formula` after using `paste`. Have a look [here](http://stackoverflow.com/questions/18067519/using-r-to-do-a-regression-with-multiple-dependent-and-multiple-independent-vari/18069211#18069211) — Metrics, Aug 08 '13 at 15:17

Roland · Accepted Answer · 2013-08-08T16:23:12.050

It seems you want something like this:

set.seed(42)
#Some data:
dat <- data.frame(A0=rnorm(100, mean=20), 
                  A1=rnorm(100, mean=30), 
                  A2=rnorm(100, mean=40), 
                  B=rnorm(100), C = rnorm(100))

#reshape your data
library(reshape2)
dat2 <- melt(dat, id.vars=c("A0", "B", "C"), value.name="y")

#do the regressions
library(plyr)
dlply(dat2, .(variable), function(df) {fit <- lm(log(y) ~ log(A0) + B + C, data=df)
                                      coef(summary(fit))   
                                      })

# $A1
#                 Estimate  Std. Error    t value     Pr(>|t|)
# (Intercept)  3.323355703 0.173727484 19.1297061 1.613475e-34
# log(A0)      0.024694764 0.057972711  0.4259722 6.710816e-01
# B            0.001001875 0.003545922  0.2825428 7.781356e-01
# C           -0.003843878 0.003045634 -1.2620944 2.099724e-01
# 
# $A2
#                 Estimate  Std. Error    t value     Pr(>|t|)
# (Intercept)  3.903836714 0.145839694 26.7679986 2.589532e-46
# log(A0)     -0.071847318 0.048666580 -1.4763174 1.431314e-01
# B           -0.001431821 0.002976709 -0.4810081 6.316052e-01
# C            0.001999177 0.002556731  0.7819271 4.361817e-01
# 
# attr(,"split_type")
# [1] "data.frame"
# attr(,"split_labels")
# variable
# 1       A1
# 2       A2

Hi Roland, thanks for your answer. However, while using dlply, it can't find y. Is it .variable the problem? I tried to declare it as .("y") but it does not seem to work — Arnaud, Aug 09 '13 at 12:30
Well, `y` is created when `melt`ing the data. Study the documentation of these functions. — Roland, Aug 09 '13 at 12:31
In the help, it is stated that: .variables variables to split data frame by, as as.quoted variables, a formula or character vector Thus, I want to split my dataframe by y (correct?), thus, should I state .(variable) as dat2$y? — Arnaud, Aug 09 '13 at 12:57
No. `y` is the dependent variable in the regression, you don't want to split by it. You want to split by the grouping variable, which is named `variable` after `melt`ing. — Roland, Aug 09 '13 at 13:18

Iteration of columns for linear regression in R

1 Answers1