Looping through many multiple regressions

Question

I am trying to run this code from this post: looping with iterations over two lists of variables for a multiple regression in R with modified variable and data frame names, because it seems to do exactly what I want and uses a very similar dataset. However, it keeps giving me an error and I don't know why, so I would really appreciate if someone could help me to understand the error or the corresponding line of code so I could try to figure out what's wrong.

for(i in 1:n) {
  vars = names(output)[names(output) %in% paste0(c(".PRE", ".POST"), i)]
  models[[as.character(i)]] = lm(paste("growth_rate ~ ", paste(vars, collapse=" +   ")),
                                 data = output)
}

Error in parse(text = x, keep.source = FALSE) : 
  <text>:2:0: unexpected end of input
1: growth_rate ~  
   ^

My dataset looks almost like the one given in the above mentioned post besides the fact that my "RDPI_T" and "DRY_T" variables are in an alternating order (which I dont think matters in this case). The analogous variables I have are 69 PRE variables called id1.PRE, id2.PRE ... id69.PRE and also 69 POST variables called id1.POST, id2.POST ... id69.POST in the output dataset. Also, growth_rate is in the same dataset called output.

Additionally, I would also like to add 2 more independent variables that are regular and do not come from a list: country and year but I am unsure how to incorporate that here?

Any help would be appreciated. Thank you!

Looping through models to do Wald tests is statistical malpractice. You need statistical consultation in the worst way. — IRTFM, Dec 07 '19 at 05:52
I am replicating a paper that has been cited more than 1000 times and published in The Quarterly Journal of Economics so I am not really questioning the statistics and instead I am trying to find a way to do it. I am happy to hear your suggestions rather than life lessons about how badly I need statistical consultation. — Viktoria Döme, Dec 07 '19 at 06:49
Was there were an appropriate consideration of the multiple comparisons problem that arises from such a process? — IRTFM, Dec 07 '19 at 17:08
They have not mentioned anything like that in the paper, I am afraid. But they came up with a single chi2 value and a p-value after running the 69 models. Basically they wanted to compare whether id.PRE is the same as id.POST (data explained in my previous post: https://stackoverflow.com/questions/59150321/create-lead-and-lag-year-dummies-for-regression-in-r) using the 69 regression models to get an overall conclusion. — Viktoria Döme, Dec 09 '19 at 02:54

score 1 · Accepted Answer · answered Dec 05 '19 at 14:06

If your columns are called id1.PRE, id2.PRE, then the paste function you have above will not work, which most likely throws the error.

Please do dput(head(output)) and paste the output, this allows us to see the column names and why it doesn't work.

Try something below,according to how you describe the column names:

#simulate data
output=data.frame(
"growth_rate"=rnorm(100),
matrix(rnorm(100*69*2),nrow=100)
)
colnames(output)[-1] = c(paste("id",1:69,".PRE",sep=""),paste("id",1:69,".POST",sep=""))
output$year = 1901:2000
output$country = sample(letters,nrow(output),replace=TRUE)

n=69
#create list to hold models
models = vector("list",n)

for(i in 1:n) {
  vars = paste0("id",i,c(".PRE", ".POST"))
# i think it works without as.formula, but better to be safe
  FORMULA = as.formula(paste("growth_rate ~ ", paste(vars, collapse=" +  ")))
  models[[i]] = lm(FORMULA,data = output)
}

If you want to include other variables:

for(i in 1:n) {
  vars = paste0("id",i,c(".PRE", ".POST"))
  # add other variables
  vars = c(vars,"country","year")
  FORMULA = paste("growth_rate ~ ", paste(vars, collapse=" +  "))
  models[[i]] = lm(FORMULA,data = output)
}

Using the newly created models, I am trying to run a Wald test but it does not seem to work on the 69 models at the same time only when I choose to do a Wald test for a single model from the list. Any ideas by any chance? `#install.packages("aod") library(aod) wald.test(b = coef(models[[1]]), Sigma = vcov(models[[1]]), Terms = 1:2) #this one works only wald.test(b = coef(models), Sigma = vcov(models,Terms = 1:2)) #this one does not work` — Viktoria Döme, Dec 06 '19 at 11:26

Looping through many multiple regressions

1 Answers1

Linked