1

I am having trouble with the scoping of variables in a "lm" function embedded within a data.table environment embedded within a function. I am trying to pass a formula through a self-written function that is passed to a lm function that is embedded in a data.table. I have created a simple example of what I am trying to accomplish:

require(data.table)
d <- as.data.frame(cbind(seq(1,10), rnorm(1000), rnorm(1000)))
names(d) <- c('firm', 'y', 'x')

lm_example <- function(formula, data, group){
  data <- data.table(data)
  data[ , list(model = list(lm(formula))), by=group]

  # what I want it to evaluate to
  # data[ , list(model = list(lm(y ~ x))), by='firm']
}

lm_example(y ~ x, data=d, group='firm')  # this doesn't work

The example evaluates to "Error in eval(expr, envir, enclos) : object 'y' not found." As a hack I have tried all sorts of combinations of "paste", "quote", "eval", "substitute", and "as.formula" but I can't seem to get that to work either.

Any help with figuring out the scoping and syntax to get this function to work would be very much appreciated.

Jonathan
  • 303
  • 1
  • 2
  • 6
  • 2
    possible duplicate of [create a formula in a data.table environment in R](http://stackoverflow.com/questions/14784048/create-a-formula-in-a-data-table-environment-in-r). See also http://stackoverflow.com/questions/15096811/why-is-using-update-on-a-lm-inside-a-grouped-data-table-losing-its-model-data for some issues which may arise with `lm` within data.tables – mnel Jan 14 '14 at 05:19
  • 2
    Also, FYI in your example, "d" is a `data.frame` not a `data.table`. :-) – A5C1D2H2I1M1N2O1R2T1 Jan 14 '14 at 05:21
  • 2
    If you use data.table, you probably have many groups. It's usually more efficient to use the work horse behind `lm`, i.e. `lm.fit`, directly. – Roland Jan 14 '14 at 08:29
  • @Roland: what are the disadvantages of `lm.fit` over `lm` in such cases; I'm guessing performance is the advantage. Can you suggest a good reference for this approach? Aside: as I'm using `stepAIC` to access `lm` I guess a re-write would be required. – Matt Weller Jan 14 '14 at 09:35
  • 2
    @MattWeller The disadvantages are that you can't use methods such as `summary.lm`. But quite often you don't need these methods and are only interested in the coefficients or very few statistical parameters, which you can easily calculate. Also, as `help("lm.fit")` points out, you should be an "experienced user" to use it, i.e., you should be able to construct a design matrix. Btw., you can get even better performance if you directly use the C function that does the actual fit. – Roland Jan 14 '14 at 09:47

0 Answers0