2

I have a function that inputs a data.frame and outputs the residual version of it with some chosen variable as predictor.

residuals.DF = function(data, resid.var, suffix="") {
  lm_f = function(x) {
    x = residuals(lm(data=data, formula= x ~ eval(parse(text=resid.var))))
  }
  resid = data.frame(apply(data,2,lm_f))
  colnames(resid) = paste0(colnames(data),suffix)
  return(resid)
}

set.seed(31233)
df = data.frame(Age = c(1,3,6,7,3,8,4,3,2,6),
                Var1 = c(19,45,76,34,83,34,85,34,27,32),
                Var2 = round(rnorm(10)*100))

df.res = residuals.DF(df, "Age", ".test")
df.res
        Age.test   Var1.test  Var2.test
1  -1.696753e-17 -25.1351351  -90.20582
2  -1.318443e-19  -0.8108108   31.91892
3  -5.397735e-18  27.6756757   84.10603
4  -5.927747e-18 -15.1621622 -105.83160
5  -3.807699e-18  37.1891892  -57.08108
6  -6.457759e-18 -16.0000000  -25.76923
7   5.117344e-17  38.3513514  -65.01871
8  -3.807699e-18 -11.8108108   35.91892
9  -3.277687e-18 -17.9729730   97.85655
10 -5.397735e-18 -16.3243243   94.10603

This works fine, however, I often need to use the eval parse combo when working with variable inputs to lm(), so I decided to write a wrapper function:

#Wrapper function for convenience for evaluating strings
evalparse = function(string) {
  eval(parse(text=string))
}

This works fine when used alone, e.g.:

> evalparse("5+5")
[1] 10

However, if one uses it in the above function, one gets:

> df.res = residuals.DF(df, "Age", ".test")
Error in eval(expr, envir, enclos) : object 'Age' not found 

I figure this is because the wrapper function means that the string gets evaluated in its own environment where the chosen variable is missing. This does not happen when using eval parse combo because it then happens in the lm() environment where the chosen variable is not missing.

Is there some clever solution to this problem? A better way of using dynamic formulas in lm()? Otherwise I will have to keep typing eval(parse(text=object)).

CoderGuy123
  • 6,219
  • 5
  • 59
  • 89
  • Have you tried `mget()` in place of `eval(parse())`? – Alex A. Apr 03 '15 at 17:25
  • get() worked in the above example, mget() didn't (returned wrong type list). – CoderGuy123 Apr 03 '15 at 17:46
  • Ah yeah, sorry, I meant `get()`. You could use `mget()` but you'd need `mget()[[1]]` to get the item from the list. If `get()` works for you I'll post it as an answer rather than a comment. – Alex A. Apr 03 '15 at 17:51
  • If `x` and `y` are the *names* of two columns in data frame `DF` then this regresses `y` on `x` (with an intercept): `lm(DF[c(y, x)])` without using `parse`, `eval`, formulas, etc. – G. Grothendieck Apr 03 '15 at 20:17

1 Answers1

4

Anytime you're trying to perform operations that modify the contents of a formula, you should use update because it is designed for this purpose.

In your case, you want to modify your function as follows:

residuals.DF = function(data, resid.var, suffix="") {
  lm_f = function(x) {
    x = residuals(lm(data=data, formula= update(x ~ 0, paste0("~",resid.var))))
  }
  resid = data.frame(apply(data,2,lm_f))
  colnames(resid) = paste0(colnames(data),suffix)
  return(resid)
}

Basically, update (or the update.formula method specifically) takes a formula as its first argument, and then allows for modifications based on its second argument. To get a handle on it, check out the following examples:

f <- y ~ x
f
# y ~ x
update(f, ~ z)
# y ~ z
update(f, x ~ y)
# x ~ y
update(f, "~ x + y")
# y ~ x + y
update(f, ~ . + z + w)
# y ~ x + z + w
x <- "x"
update(f, paste0("~",x))
# y ~ x

As you can see, the second argument can be a formula or character string containing one or more variables. This greatly simplifies the creation of a dynamically modified formula where you are only trying to change one part of the formula.

Thomas
  • 43,637
  • 12
  • 109
  • 140
  • That is neat, yes. Someone posted a solution based on `as.formula()` before, but decided to delete it apparently. Goes like this: `x = residuals(lm(data=data, formula= as.formula(paste0("x ~ ",resid.var, collapse=""))))` – CoderGuy123 Apr 03 '15 at 18:13
  • 1
    Also, don't forget `?reformulate` for these kinds of tasks. – Ben Bolker Apr 03 '15 at 19:33