3

I would like to subset a dataframe by referring to a column with a string and select the values of that column that fulfill a condition. From the following code

 employee <- c('John Doe','Peter Gynn','Jolie Hope')
 salary <- c(21000, 23400, 26800)
 startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
 employ.data <- data.frame(employee, salary, startdate)
 salary_string <- "salary"

I want to get all salaries over 23000 by using the salary_string to refer to the column name.

I tried without succes:

set <- subset(employ.data, salary_string > 23000)
set2 <- employ.data[, employ.data$salary_string > 23000)

This does not seem to work because the salary_string is of type character but what I need is some sort of "column name object". Using as.name(salary_string) does not work neither. I know I could get the subset by using

set <- subset(employ.data, salary > 23000)

But my goal is to use the column name that is of type character (salary_string) once with subset(employ.data, ... ) and once with employ.data[, ...]

Simon
  • 79
  • 1
  • 6
  • Care to explain the rationale behind what you're trying to achieve? – infominer Apr 29 '15 at 22:33
  • I'm creating a function that takes a string/character as input and if the input matches the column name, the function should create a subset of the values of the column that fulfill a condition – Simon Apr 29 '15 at 22:40
  • 1
    An ugly alternative: `subset(employ.data,eval(parse(text=salary_string)) > 23000)` – Frank Apr 29 '15 at 22:41
  • 1
    @Frank Almost the same comment at the same time ;) – cryo111 Apr 29 '15 at 22:46
  • @Simon, got it. It made me practice writing functions using dplyr. Let me know if need any hints for tht – infominer Apr 29 '15 at 23:49

3 Answers3

5

Short answer is: do not use subset but something like

employ.data[employ.data[salary_string]>23000,]
cryo111
  • 4,444
  • 1
  • 15
  • 37
  • 1
    even shorter `employ.data[employ.data["salary"]>23000,]` – infominer Apr 29 '15 at 22:33
  • 2
    Yes, but I think he wants to use the variable and not the string directly, if I am not mistaken. – cryo111 Apr 29 '15 at 22:34
  • BTW: Here is the post that explains why it's better to use `[` instead of `subset`. http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset – cryo111 Apr 29 '15 at 22:41
3

Here's another idea:

dplyr::filter(employ.data, get(salary_string) > 23000)

Which gives:

#    employee salary  startdate
#1 Peter Gynn  23400 2008-03-25
#2 Jolie Hope  26800 2007-03-14
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
2

For the sake of showing how to achieve the result with subset():

The issue you're having is because subset() uses non-standard evaluation. Here's one way to substitute your string into the subset() function.

## set up an unevaluated call
e <- call(">", as.name(salary_string), 23000)
## evaluate it in subset()
subset(employ.data, eval(e))
#     employee salary  startdate
# 2 Peter Gynn  23400 2008-03-25
# 3 Jolie Hope  26800 2007-03-14

Or as Steven suggests, the following would also work well.

subset(employ.data, eval(as.name(salary_string)) > 23000)
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • Another hacky way that I strongly discourage to use ;) `subset(employ.data,eval(parse(text=paste0(salary_string,">","23000"))))` – cryo111 Apr 29 '15 at 22:43
  • 1
    Why not `subset(employ.data, eval(as.name(salary_string)) > 23000)` ? – Steven Beaupré Apr 29 '15 at 22:58
  • 1
    Yeah that too @StevenBeaupré! I'm a little rusty. Been off the grid for a while :) – Rich Scriven Apr 29 '15 at 22:59
  • Again, I think the examples with `subset` and `eval` are kind of nice if you want to play around with the parsing, but should not be used in production code. You can get in all kinds of trouble. Who knows what R is really doing behind the scenes when it evaluates the `subset` argument? At least I don't. – cryo111 Apr 29 '15 at 23:20