R use string to refer to column

Question

I would like to subset a dataframe by referring to a column with a string and select the values of that column that fulfill a condition. From the following code

 employee <- c('John Doe','Peter Gynn','Jolie Hope')
 salary <- c(21000, 23400, 26800)
 startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
 employ.data <- data.frame(employee, salary, startdate)
 salary_string <- "salary"

I want to get all salaries over 23000 by using the salary_string to refer to the column name.

I tried without succes:

set <- subset(employ.data, salary_string > 23000)
set2 <- employ.data[, employ.data$salary_string > 23000)

This does not seem to work because the salary_string is of type character but what I need is some sort of "column name object". Using as.name(salary_string) does not work neither. I know I could get the subset by using

set <- subset(employ.data, salary > 23000)

But my goal is to use the column name that is of type character (salary_string) once with subset(employ.data, ... ) and once with employ.data[, ...]

Care to explain the rationale behind what you're trying to achieve? — infominer, Apr 29 '15 at 22:33
I'm creating a function that takes a string/character as input and if the input matches the column name, the function should create a subset of the values of the column that fulfill a condition — Simon, Apr 29 '15 at 22:40
An ugly alternative: `subset(employ.data,eval(parse(text=salary_string)) > 23000)` — Frank, Apr 29 '15 at 22:41
@Simon, got it. It made me practice writing functions using dplyr. Let me know if need any hints for tht — infominer, Apr 29 '15 at 23:49

score 5 · Accepted Answer · answered Apr 29 '15 at 22:32

5

Short answer is: do not use subset but something like

employ.data[employ.data[salary_string]>23000,]

answered Apr 29 '15 at 22:32

cryo111

4,444
1
15
37

1

even shorter `employ.data[employ.data["salary"]>23000,]` – infominer Apr 29 '15 at 22:33
2

Yes, but I think he wants to use the variable and not the string directly, if I am not mistaken. – cryo111 Apr 29 '15 at 22:34
BTW: Here is the post that explains why it's better to use `[` instead of `subset`. http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset – cryo111 Apr 29 '15 at 22:41

Steven Beaupré · Answer 2 · 2015-04-29T22:55:20.883

3

Here's another idea:

dplyr::filter(employ.data, get(salary_string) > 23000)

Which gives:

#    employee salary  startdate
#1 Peter Gynn  23400 2008-03-25
#2 Jolie Hope  26800 2007-03-14

edited Apr 29 '15 at 22:55

answered Apr 29 '15 at 22:50

Steven Beaupré

21,343
7
57
77

Rich Scriven · Answer 3 · 2015-04-29T23:06:30.537

2

For the sake of showing how to achieve the result with subset():

The issue you're having is because subset() uses non-standard evaluation. Here's one way to substitute your string into the subset() function.

## set up an unevaluated call
e <- call(">", as.name(salary_string), 23000)
## evaluate it in subset()
subset(employ.data, eval(e))
#     employee salary  startdate
# 2 Peter Gynn  23400 2008-03-25
# 3 Jolie Hope  26800 2007-03-14

Or as Steven suggests, the following would also work well.

subset(employ.data, eval(as.name(salary_string)) > 23000)

edited Apr 29 '15 at 23:06

answered Apr 29 '15 at 22:41

Rich Scriven

97,041
11
181
245

Another hacky way that I strongly discourage to use ;) `subset(employ.data,eval(parse(text=paste0(salary_string,">","23000"))))` – cryo111 Apr 29 '15 at 22:43
1

Why not `subset(employ.data, eval(as.name(salary_string)) > 23000)` ? – Steven Beaupré Apr 29 '15 at 22:58
1

Yeah that too @StevenBeaupré! I'm a little rusty. Been off the grid for a while :) – Rich Scriven Apr 29 '15 at 22:59
Again, I think the examples with `subset` and `eval` are kind of nice if you want to play around with the parsing, but should not be used in production code. You can get in all kinds of trouble. Who knows what R is really doing behind the scenes when it evaluates the `subset` argument? At least I don't. – cryo111 Apr 29 '15 at 23:20

R use string to refer to column

3 Answers3