2

I am able to carry out something like the following:

foo=data.frame(y=rnorm(100),x1=rnorm(100),x2=rnorm(100),x3=rnorm(100))
full=lm(foo$y ~ foo$x1 + foo$x2 + foo$x3)
nil=lm(foo$y ~ 1)
fwd=step(nil,scope=list(lower=formula(nil),upper=formula(full)),direction='forward')

But I'm working with data.table like so:

library(data.table)
foo=data.table(y=rnorm(100),x1=rnorm(100),x2=rnorm(100),x3=rnorm(100))
full=foo[,lm(y ~ x1 + x2 + x3)]
nil=foo[,lm(y ~ 1)]
fwd=foo[,step(nil,scope=list(lower=formula(nil),upper=formula(full)),direction='forward')]

And I get the error:

Error in eval(expr, envir, enclos) : object 'x1' not found

But x1 is defined inside the J expression above for data.table - is there a way around this without having to convert my table to a data.frame?

Palace Chan
  • 8,845
  • 11
  • 41
  • 93
  • 2
    See SO Post: [using lm(my_formula) inside [.data.table's j](http://stackoverflow.com/questions/19311600/using-lmmy-formula-inside-data-tables-j). – Parfait Aug 10 '15 at 02:25

1 Answers1

4

You will need to pass .SD as the data argument to lm within [.data.table, as otherwise data.table optimizes it'sj argument to only use what is referenced.

foo=data.table(y=rnorm(100),x1=rnorm(100),x2=rnorm(100),x3=rnorm(100))
full=foo[,lm(y ~ x1 + x2 + x3,data=.SD)]
nil=foo[,lm(y ~ 1,data=.SD)]
fwd <- step(nil,scope=list(lower=formula(nil),upper=formula(full)),direction='forward')

# using .SD means all columns are available
ls(environment(formula(nil))
## [1] "x1" "x2" "x3" "y" 

# compared with
nil.noSD =foo[,lm(y ~ 1)]
ls(environment(formula(nil.noSD)))
## [1] "y"

Note that data.table and lm and the scoping rules for update.formula etc don't always "play nicely"

See Why is using update on a lm inside a grouped data.table losing its model data? for an example

Community
  • 1
  • 1
mnel
  • 113,303
  • 27
  • 265
  • 254