0

I am trying to master building functions in R. Say I have a data frame or data.table,

dummy <- df(y, x, a, b, who)

Where the vector "who" is like so,

who <- c("Joseph", "Kim", "Billy")

I would like to use the character vector to perform various regression models and name the outputs and their summary statistics. So for the entry, "Billy" in the vector above, I would like something like this:

function() {
ols.reg.Billy <- lm(y ~ x + a + b, data = dummy[dummy$who == "Billy"])
dw.Billy <- dwtest(ols.reg.Billy)

output.Billy <- list(ols.reg.Billy, dw.Billy)
return(output.Billy)
}

But for 500 different entries of the who vector above.

Is there some way to do this? What's the most efficient way? I keep getting errors and I feel I am seriously missing something. Is there some way to use paste?

pogibas
  • 27,303
  • 19
  • 84
  • 117
jvalenti
  • 604
  • 1
  • 9
  • 31

2 Answers2

3

If this doesn't solve it, please provide a reproducible example. It makes it easier to help you.

library(lmtest)
    outputs <- lapply(who, function(name) {
        ols.reg <- lm(y ~ x + a + b, data = dummy[dummy$who == name])
        dw <- dwtest(ols.reg)
        output <- paste(c("ols.reg","dw"), name, sep = "_")
        return(output)
    })
csgroen
  • 2,511
  • 11
  • 28
  • I like this solution a lot it's intuitive. What if I wanted to name the two elements in the list(ols.reg, dw) according to each value of "who"? – jvalenti Aug 22 '17 at 00:25
  • 1
    You can add names to the list with the name of the current "who" value before returning it. I updated the solution. – csgroen Aug 22 '17 at 00:47
2

1) Map Using the built in CO2 data set suppose we wish to regress uptake on conc separately for each Type. Note that this names the components by the Type.

Map(function(x) lm(uptake ~ conc, CO2, subset = Type == x), levels(CO2$Type))

giving this two component list (one component for each level of Type -- Quebec and Mississauga) -- continued after output.

$Quebec

Call:
lm(formula = uptake ~ conc, data = CO2, subset = Type == x)

Coefficients:
(Intercept)         conc  
   23.50304      0.02308  


$Mississippi

Call:
lm(formula = uptake ~ conc, data = CO2, subset = Type == x)

Coefficients:
(Intercept)         conc  
   15.49754      0.01238  

2) Map/do.call We may wish to not only name the components using the Type but also have x substituted with the actual Type in the Call: line of the output. In that case use do.call to invoke lm and use quote to ensure that the name of the data frame rather than its value is displayed and use bquote to perform the substitution for x.

reg <- function(x) {
  do.call("lm", list(uptake ~ conc, quote(CO2), subset = bquote(Type == .(x))))
}
Map(reg, levels(CO2$Type))

giving:

$Quebec

Call:
lm(formula = uptake ~ conc, data = CO2, subset = Type == "Quebec")

Coefficients:
(Intercept)         conc  
   23.50304      0.02308  


$Mississippi

Call:
lm(formula = uptake ~ conc, data = CO2, subset = Type == "Mississippi")

Coefficients:
(Intercept)         conc  
   15.49754      0.01238  

3) lmList The nlme package has lmList for doing this:

library(nlme)
lmList(uptake ~ conc | Type, CO2, pool = FALSE)

giving:

Call:
  Model: uptake ~ conc | Type 
   Data: CO2 

Coefficients:
            (Intercept)       conc
Quebec         23.50304 0.02308005
Mississippi    15.49754 0.01238113
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341