Running several linear regressions from a single dataframe in R

Question

I have a dataset of export trade data for a single country with 21 columns. The first column indicates the years (1962-2014) while the other 20 are trading partners. I am trying to run linear regressions for the years column and each other column. I have tried the method recommended here: Running multiple, simple linear regressions from dataframe in R that entails using

combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE)

However this only yields the intercept for each pair which is less important to me than the slope of the regressions.

Additionally I have tried to use my dataset as a time series, however when I try to run

lm(dimnames~., brazilts, na.action=na.exclude)

(where brazilts is my dataset as a time series from "1962" to "2014") it returns:

Error in model.frame.default(formula = dimnames ~ ., data = brazilts,  : 
  object is not a matrix.

I therefore tried the same method with a matrix but then it returned the error:

Error in model.frame.default(formula = . ~ YEAR, data = brazilmatrix,  : 
  'data' must be a data.frame, not a matrix or an array

(where brazilmatrix is my dataset as a data.matrix which includes a column for years).

Really I am not even proficient in R and at this point. The ultimate goal is to create a loop that I can use to get take regressions for a significantly larger dataset of gross exports by country-pair per year for 28 countries. Perhaps I am attacking this in entirely the wrong way, so any help or criticism is welcome. Bare in mind that the years (1962-2014) are in effect my explanatory variable and the value of gross export is my dependent variable, which may be throwing off my syntax in the above examples. Thanks in advance!

Positing sample data set and expected output is a lot more helpful to helping. — Gopala, May 23 '16 at 15:42
You need to provide some sort of [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) if you would like help. We need to see what exactly you have tried and what objects you have tried feeding to these functions. We can't help you with your code if we can't actually see your code. — MrFlick, May 23 '16 at 15:47
Sorry @MrFlick, I'm entirely new to coding and wasn't aware of how to properly ask a question, but that makes total sense. Luckily cofeinjunky gave me a workable answer below. Thanks for offering your help though! — itsdatboi, May 25 '16 at 13:38

coffeinjunky · Accepted Answer · 2018-02-28T22:01:01.733

Just to add an alternative, I would propose going down this route:

library(reshape2)
library(dplyr)
library(broom)

df <- melt(data.frame(x = 1962:2014, 
                      y1 = rnorm(53), 
                      y2 = rnorm(53), 
                      y3 = rnorm(53)), 
          id.vars = "x")

df %>% group_by(variable) %>% do(tidy(lm(value ~ x, data=.)))

Here, I just melt the data so that all relevant columns are given by groups of rows, to be able to use dplyr's grouped actions. This gives the following dataframe as output:

Source: local data frame [6 x 6]
Groups: variable [3]

  variable        term     estimate    std.error  statistic   p.value
    (fctr)       (chr)        (dbl)        (dbl)      (dbl)     (dbl)
1       y1 (Intercept) -3.646666114 18.988154862 -0.1920495 0.8484661
2       y1           x  0.001891627  0.009551103  0.1980533 0.8437907
3       y2 (Intercept) -8.939784046 16.206935047 -0.5516024 0.5836297
4       y2           x  0.004545156  0.008152140  0.5575415 0.5795966
5       y3 (Intercept) 21.699503502 16.785586452  1.2927462 0.2019249
6       y3           x -0.010879271  0.008443204 -1.2885240 0.2033785

This is a pretty convenient form to continue working with the coefficients. All that is required is to melt the dataframe so that all columns are rows in the dataset, and then to use dplyr's group_by to carry out the regression in all subsets. broom::tidy puts the regression output into a nice dataframe. See ?broom for more information.

In case you need to keep the models to do adjustments of some sort (which are implemented for lm objects), then you can also do the following:

df %>% group_by(variable) %>% do(mod = lm(value ~ x, data=.))

Source: local data frame [3 x 2]
Groups: <by row>

# A tibble: 3 x 2
  variable      mod
*   <fctr>   <list>
1       y1 <S3: lm>
2       y2 <S3: lm>
3       y3 <S3: lm>

Here, for each variable, the lm object is stored in the dataframe. So, if you want to get the model output for the first, you can just access it as you would access any normal dataframe, e.g.

tmp <- df %>% group_by(variable) %>% do(mod = lm(value ~ x, data=.))
tmp[tmp$variable == "y1",]$mod
[[1]]

Call:
lm(formula = value ~ x, data = .)

Coefficients:
(Intercept)            x  
  -1.807255     0.001019

This is convenient if you want to apply some methods to all lm objects since you can use the fact that tmp$mod gives you a list of them, which makes it easy to pass to e.g. lapply.

This was incredibly useful and seems to be the way for me to go. I'm dealing with country pairs by year so I used `df %>% group_by(country_origin_id, country_destination_id) %>% do(tidy(lm(year~adjusted_export_val, data=.)))` which breaks it down as a regression for each country pair. My only problem with this method is that it only returns ten rows representing five country pair regression (one row for the intercept and one row for the slope per country pair). How can I see all of the regressions it just did? I'm sure broom has something for this but I can't find it. Thank you so much! — itsdatboi, May 25 '16 at 13:35
Nevermind, I just didn't know how to view the object individually. Figured it out. You have been a serious help, thank you! I've been hammering away at this for too long, wish I would've come across the broom package sooner. Cheers. — itsdatboi, May 25 '16 at 13:51

score 0 · Answer 2 · edited May 23 '16 at 19:21

Quite aside from the statistical justification for doing this, the programming problem is an interesting one. Here is a solution, but probably not the most elegant one. First, create a sample data set:

x = c(1962:2014)
y1 = c(rnorm(53))
y2 = c(rnorm(53))
y3 = c(rnorm(53))

mydata = data.frame(x, y1, y2, y3)
attach(mydata)  
head(mydata)
#     x         y1          y2         y3
#1 1962 -0.9884054 -1.68208217  0.5980446
#2 1963 -1.0741098  0.51309753  1.0986366
#3 1964  0.1357549 -0.23427820  0.1482258
#4 1965 -0.8846920 -0.60375400  0.7162992
#5 1966 -0.5529187  0.85573739  0.5541827
#6 1967  0.4881922 -0.09360152 -0.5379037

Next, use a for loop to do several regressions:

for(i in 2:4){
  reg = lm(x ~ mydata[,i])
  print(reg)
  }

Call:
lm(formula = x ~ mydata[, i])

Coefficients:
(Intercept)  mydata[, i]  
  1988.0088      -0.1341  


Call:
lm(formula = x ~ mydata[, i])

Coefficients:
(Intercept)  mydata[, i]  
    1987.87         2.07  


Call:
lm(formula = x ~ mydata[, i])

Coefficients:
(Intercept)  mydata[, i]  
   1987.304       -4.101

Just as a small comment: if you add `reg <- list()` before the `for` loop, you could store each regression for future use by calling `reg[i] ~ lm(x ~ mydata[,i])` instead. — coffeinjunky, May 24 '16 at 22:12

Running several linear regressions from a single dataframe in R

2 Answers2

Linked