Right Way to Replace Names by Strings in Expressions

Question

I have some questions about substituting names in expressions by strings in a consistent ways across different functions From the dataframe

sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))

In lm, I can use different commands to substitute a regressor by a string in the formula

lm(a~get("b"),sample_df) # substituting a part of a formula
lm(a~eval(as.name("b")),sample_df) # substituting a part of a formula
lm(substitute(a~v,list(v=as.name("b"))),sample_df) # substituting the whole formula
lm(eval(substitute(a~v,list(v=as.name("b"))),sample_df)) # substituting the whole formula
eval(substitute(lm(a~v,sample_df),list(v=as.name("b")))) # substituting the whole call

What are the differences between all these commands? I can see the first two command gives a regressor named respectively get("b") and eval(as.name("b")) while the others give b. Are there other (maybe more subtle/problematic) differences? Why is eval irrelevant between 3 and 4?

In data.table, all works like lm

sample_dt=as.data.table(sample_df)
sample_dt[,mean:=mean(get("b"))]
sample_dt[,eval(substitute(mean:=mean(v),list(v=as.name("b"))))]
eval(substitute( sample_dt[,mean:=mean(v)],list(v=as.name("b"))))

Now, trying to substitute a name by a string in dplyr
```
sample_df %>% mutate(mean=mean(get("b")))
eval(substitute(sample_df %>% mutate(mean=mean(v)),list(v=as.name("b"))))
```
The first looks for an object in the global environment while the second works. How could I predict get would not work here while it works in lm and [.data.table ?

Why not `lm(a~b,sample.df)` ? that's what's suggested on the help page. — Carl Witthoft, Sep 14 '14 at 16:14
ahah. It's always the same problem when coming up with the simplest example. I really want to substitute by a string - let's say I want to loop on different regressors using their names. — Matthew, Sep 14 '14 at 16:23
So you are going to do something like (for x in c("b","c")) lm(a~as.name(x),data.frame)` ? Plus you want a "nice" output name for use in thing like `predict.lm` ? — Carl Witthoft, Sep 14 '14 at 16:32
I would like the exact same output whether I use directly b instead of "b" indeed. Let's say I want to rewrite things I often do as functions that take a dataframe and a variable name as an argument. — Matthew, Sep 14 '14 at 16:39

IRTFM · Answer 1 · 2014-09-14T17:33:08.360

3

You are setting up your test cases incorrectly for the purpose that was described. You want to pass in various values with a variable that contains the character value:

sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))
x <- "b"
lm(a~get(x),sample_df) # succeeds
lm(a~eval(as.name(x)),sample_df)  # also succeeds

The more typical way of doing this is to use as.formula outside the lm() call:

form <- as.formula(paste("a ~", x))
form
#a ~ b
lm(form,sample_df)
predict(lm(form,sample_df))
1 2 3 4 5 
1 2 3 4 5

The advantage of doing this outside the lm() function is that the substitutions are completed before the recording of the call by the lm proceesing facilities. Compare the output of:

terms(lm(form,sample_df))
terms( lm(a~eval(as.name(x)),sample_df))

It will take lot of gymnastic "computing on the language" to get back to quote(b) from that second example whereas it is really easy to get the RHS from the terms()-object if a formula object was passed in:

> terms(lm(form,sample_df))[[3]]
b

edited Sep 14 '14 at 17:33

answered Sep 14 '14 at 17:24

IRTFM

258,963
21
364
487

I see your point. But what if I want to substitute an expression that is *not* a formula, like in data.table or dplyr? Is deparse really the way to go? – Matthew Sep 14 '14 at 18:31
First thing to do would be to get your terminology in a form that conforms to R usage so we can have an unambiguous discussion. At the moment I cannot tell what you are proposing. In R an expression is not a formula, and a formula is not an expression. – IRTFM Sep 14 '14 at 18:39
Then I'm asking how to use your method in a consistent way across formula and expressions - in my example `lm(v2~v1,DT)` and `DT[,mean:=mean(v1)]` – Matthew Sep 14 '14 at 18:47
The question seems impossibly unfocussed. I cannt figure out which sort of evaluation environment you are examining. Is this the question/answer addressing the dplyr portion of your uncertainty: http://stackoverflow.com/questions/22005419/dplyr-without-hard-coding-the-variable-names ? – IRTFM Sep 14 '14 at 19:05
This is exactly what I do in `dplyr` (second command). This solution works across all examples in `lm` `data.table` and `dplyr`. – Matthew Sep 14 '14 at 19:07
The second answer (not preferred by @hadley) showed the use of setting an environment in `get` which was your first dplyr request. – IRTFM Sep 14 '14 at 19:11
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/61217/discussion-between-matthew-and-bondeddust). – Matthew Sep 14 '14 at 19:15

Right Way to Replace Names by Strings in Expressions

1 Answers1