7

I realized a strange behavior today with in my R code. I tried a package {boot.StepAIC} which includes a bootstrap function for the results of the stepwise regression with the AIC. However I do not think the statistical background is here the problem (I hope so).
I can use the function at the top level of R. This is my example code.

require(MASS)
require(boot.StepAIC)

n<-100
x<-rnorm(n); y<-rnorm(n,sd=2); z<-rnorm(n,sd=3); res<-x+y+z+rnorm(n,sd=0.1)
dat.test<-as.data.frame(cbind(x,y,z,res))
form.1<-as.formula(res~x+y+z)
boot.stepAIC(lm(form.1, dat.test),dat.test) # should be OK - works at me

However, I wanted to wrap that in an own function. I pass the data and the formula to that function. But I get an error within boot.stepAIC() saying:

the model fit failed in 100 bootstrap samples Error in strsplit(nam.vars, ":") : non-character argument

# custom function
fun.boot.lm.stepAIC<-function(dat,form) {
  if(!inherits(form, "formula")) stop("No formula given")
  fit.lm<-lm(formula=form,data=dat)
  return(boot.stepAIC(object=fit.lm,data=dat))
 }
fun.boot.lm.stepAIC(dat=dat.test,form=form.1)
# results in an error 

So where is the mistake? I suppose it must have something to do with the local and global environment, doesn't it?

Siguza
  • 21,155
  • 6
  • 52
  • 89
Sebastian
  • 633
  • 2
  • 6
  • 16
  • I haven't used `boot.stepAIC` before but suspect it may also have to do with how the formula being passed into the function (which is related to the environment issues too). See http://stackoverflow.com/q/6877534, http://stackoverflow.com/q/7666807 for some ideas. In particular, calling `lm` or `boot.stepAIC` via `do.call` may help as then the arguments are evaluated before being passed in. You may also investigate the `as.name` suggestion in the comments. These issues are tricky -- good luck! – Aaron left Stack Overflow Apr 16 '12 at 15:17
  • http://stackoverflow.com/q/8998884/210673 also looks to be the same issue. – Aaron left Stack Overflow Apr 18 '12 at 12:13
  • yep. I read through this already. I suppose the issues are connected. – Sebastian Apr 19 '12 at 08:18
  • But maybe also my former (utterly confusing) post is related. http://stackoverflow.com/questions/9161273 – Sebastian Apr 19 '12 at 08:24
  • Yes, it seems likely that that other post is related too. It is confusing though, and I wasn't able to recreate some of your errors. See comment there. – Aaron left Stack Overflow Apr 19 '12 at 13:55

1 Answers1

5

Using do.call as in anova test fails on lme fits created with pasted formula provides the answer.

boot.stepAIC doesn't have access to form when run within a function; that can be recreated in the global environment like this; we see that lm is using form.1 as the formula, and removing it makes boot.stepAIC fail.

> form.1<-as.formula(res~x+y+z)
> mm <- lm(form.1, dat.test)
> mm$call
lm(formula = form.1, data = dat.test)
> rm(form.1)
> boot.stepAIC(mm,dat.test)
# same error as OP

Using do.call does work. Here I use as.name as well; otherwise the mm object carries around the entire dataset instead of just the name of it.

> form.1<-as.formula(res~x+y+z)
> mm <- do.call("lm", list(form.1, data=as.name("dat.test")))
> mm$call
lm(formula = res ~ x + y + z, data = dat.test)
> rm(form.1)
> boot.stepAIC(mm,dat.test)

To apply this to the original problem, I'd do this:

fun.boot.lm.stepAIC<-function(dat,form) {
  if(!inherits(form, "formula")) stop("No formula given")
  mm <- do.call("lm", list(form, data=as.name(dat)))
  do.call("boot.stepAIC", list(mm,data=as.name(dat)))
}    
form.1<-as.formula(res~x+y+z)
fun.boot.lm.stepAIC(dat="dat.test",form=form1)

This works too but the entire data set gets included in the final output object, and the final output to console, as well.

fun.boot.lm.stepAIC<-function(dat,form) {
  if(!inherits(form, "formula")) stop("No formula given")
  mm <- do.call("lm", list(form, data=dat))
  boot.stepAIC(mm,data=dat)
}    
form.1<-as.formula(res~x+y+z)
fun.boot.lm.stepAIC(dat=dat.test,form=form.1)
Community
  • 1
  • 1
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
  • Thanks. Due to the comprehensive explanation, I see the point. I also read the two related posts. Honestly, I still have some headache with these issues. What is the "use case" for that behavior? I pass two objects to that function so it should be executed in the context of the calling function. I see no point in R or the boot.stepAIC (don't know who to "blame") to redirect to the global environment. The point is how can I be certain in which context a function is looking for the objects. My understanding so far is, alway use do.call() rather than the function directly. Any strategies on that? – Sebastian Apr 17 '12 at 05:48
  • Well I played a bit around with that and I still try to understand the context. In your last example you basically pass the name of the global (or parent) variable and access out of the function the global variable dat.test. Is it a call by reference? Could it be that the modeling functions sometimes use a call by reference strategy even I assume its purely call by value? – Sebastian Apr 17 '12 at 10:58
  • 1) `boot.stepAIC` uses `update`, which reruns the call of the linear model; if the call had the name of a function object (such as `form`) then that object must be accessible. 2) Each function has an environment (the one it was created in, usually), and a chain of parent environments, that it looks in to find objects. However, running a function within another function does not change this parent chain! At the end of the chain is the global environment, so when `form` is in the global environment, it can find it. But when `form` is in the environment of the calling function, it can't. – Aaron left Stack Overflow Apr 17 '12 at 15:40
  • Well, I think I got that. Still I dont understand the problems.I create a function f and pass the formula form (from global env) to that function. Inside the function I create a linear model everything should be fine. because form is in the f environment and the model also. I call stepAIC which alters the model. So why is there any need to search in the global environment. It should search the model and the formula inside of f environment where it was called from. – Sebastian Apr 17 '12 at 16:26
  • The problems I have still exists. Its because I created the function only for one reason to use it in an apply function. And I am struggling again with the same issues I had before even with your solution. So I know that I did not get the point and have not understood the internal procedure how its correctly handled. :( – Sebastian Apr 17 '12 at 16:28
  • (meant to send earlier...) 3) This means that your original code works if you set `form <- form.1`, as `boot.stepAIC` can then find `form` in the global environment. However, if you set `form` and `form.1` to different things, you'll get very unexpected results! R is generally a functional language, but these environment chains are perhaps exceptions. It's better (in my opinion) is to form the call with `do.call` so that going up the chain for the formula isn't necessary. – Aaron left Stack Overflow Apr 17 '12 at 16:42
  • 1
    You say: "It should search the model and the formula inside of f environment where it was called from". Perhaps. But it doesn't. Perhaps https://github.com/hadley/devtools/wiki/Scoping and https://github.com/hadley/devtools/wiki/Evaluation would be useful references. – Aaron left Stack Overflow Apr 17 '12 at 16:44
  • Thanks. I ll try to get it working and will use the ressources you mentioned. If I fail I would open a new question :) Thanks for your efforts – Sebastian Apr 17 '12 at 16:54