12

I would like to be able to write a function that runs regressions in a data.table by groups and then nicely organizes the results. Here is a sample of what I would like to do:

require(data.table)
dtb = data.table(y=1:10, x=10:1, z=sample(1:10), weights=1:10, thedate=1:2)
models = c("y ~ x", "y ~ z")

res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=weights, data=.SD))),by=thedate]})

#do more stuff with res

I would like to wrap all this into a function since the #doe more stuff might be long. The issue I face is how to pass the various names of things to data.table? For example, how do I pass the column name weights? how do I pass thedate? I envision a prototype that looks like this:

myfun = function(dtb, models, weights, dates)

Let me be clear: passing the formulas to my function is NOT the problem. If the weights I wanted to use and the column name describing the date, thedate were known then my function could simply look like this:

 myfun = function(dtb, models) {
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=weights, data=.SD))),by=thedate]})

 #do more stuff with res
 }

However the column names corresponding to thedate and to the weights are unknown in advance. I would like to pass them to my function as so:

#this will not work
myfun = function(dtb, models, w, d) {
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=w, data=.SD))),by=d]})

 #do more stuff with res
 }

Thanks

Alex
  • 19,533
  • 37
  • 126
  • 195

3 Answers3

7

Here is a solution that relies on having the data in long format (which makes more sense to me, in this cas

library(reshape2)
dtlong <- data.table(melt(dtb, measure.var = c('x','z')))


foo <- function(f, d, by, w ){
  # get the name of the w argument (weights)
  w.char <- deparse(substitute(w))
  # convert `list(a,b)` to `c('a','b')`
  # obviously, this would have to change depending on how `by` was defined
  by <- unlist(lapply(as.list(as.list(match.call())[['by']])[-1], as.character))
  # create the call substituting the names as required
  .c <- substitute(as.list(coef(lm(f, data = .SD, weights = w), list(w = as.name(w.char)))))
  # actually perform the calculations
  d[,eval(.c), by = by]
}

foo(f= y~value, d= dtlong, by = list(variable, thedate), w = weights)

   variable thedate (Intercept)       value
1:        x       1   11.000000 -1.00000000
2:        x       2   11.000000 -1.00000000
3:        z       1    1.009595  0.89019190
4:        z       2    7.538462 -0.03846154
mnel
  • 113,303
  • 27
  • 265
  • 254
3

one possible solution:

fun = function(dtb, models, w_col_name, date_name) {
     res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=eval(parse(text=w_col_name)), data=.SD))),by=eval(parse(text=paste0("list(",date_name,")")))]})

}
Alex
  • 19,533
  • 37
  • 126
  • 195
0

Can't you just add (inside that anonymous function call):

 f <- as.formula(f) 

... as a separate line before the dtb[,as.list(coef(lm(f, ...)? That's the usual way of turning a character element into a formula object.

> res = lapply(models, function(f) {f <- as.formula(f)
                 dtb[,as.list(coef(lm(f, weights=weights, data=.SD))),by=thedate]})
> 
> str(res)
List of 2
 $ :Classes ‘data.table’ and 'data.frame':  2 obs. of  3 variables:
  ..$ thedate    : int [1:2] 1 2
  ..$ (Intercept): num [1:2] 11 11
  ..$ x          : num [1:2] -1 -1
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ :Classes ‘data.table’ and 'data.frame':  2 obs. of  3 variables:
  ..$ thedate    : int [1:2] 1 2
  ..$ (Intercept): num [1:2] 6.27 11.7
  ..$ z          : num [1:2] 0.0633 -0.7995
  ..- attr(*, ".internal.selfref")=<externalptr> 

If you need to build character versions of formulas from component names, just use paste or paste0 and pass to the models character vector. Tested code supplied with receipt of testable examples.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • the formula is nto the issue, `lm` takes strings fine. the issue is `weights` and `thedate` – Alex Feb 21 '13 at 19:08
  • i think you might have misunderstood my question. i would like to put all that code into a function – Alex Feb 21 '13 at 19:09
  • I did all that was sensible with the example provided. Please edit your example. – IRTFM Feb 21 '13 at 19:10
  • Yes. You need objects (which have not yet been offered) to pass into a function like: `f3 <- function(f, wts, byvar, dtab) {dtab[ , as.list(coef(lm(f, weights=wts, data=.SD))), by=byvar] }` and you need to use either `mapply` or nest your `lapply` calls. I don't think it makes sense to have all of those objects in one data.table. You already started the process by defining two formulas. If you want multiple weights then you need to define them as well. – IRTFM Feb 21 '13 at 19:23
  • understand. however, these objects are linked. the weights change by date, for example, so it actually makes perfect sense to have them with the data. the basic question is: how do you pass the name `thedate` to use as the by variable in data.table? – Alex Feb 21 '13 at 19:25
  • 1
    @Alex `by=eval(thedate)` should answer the basic question in last comment, iiuc. – Matt Dowle Feb 22 '13 at 11:15