2

I'm trying to use dplyr to filter my data based on a test condition, but this test condition can change depending on other variables.

Using the built in sample dataset cars:

data(cars)

I'd like to do something like this:

if (foo == 0) {
  test <- speed > 15
} else {
  test <- dist < 50
}
filter(cars, test)

This doesn't work. I can get it to work if I alter it to something like this:

if (foo == 0) {
  test <- 'cars$speed > 15'
} else {
  test <- 'cars$dist < 50'
}
filter(cars, eval(parse(text = test)))

But

  1. Having to type out cars$speed and cars$dist seems to defeat the purpose of using the filter function.
  2. According to this SO answer, using the eval(parse(text = ...)) construction is not recommended.

Is there a better way of achieving this?

Community
  • 1
  • 1
Steve
  • 2,401
  • 3
  • 24
  • 28

2 Answers2

3

You could do this:

filter(cars, if(foo==0){speed>15}else{dist<50})

Test by comparing with the simple filter:

> foo =0
> identical(filter(cars, speed>15), filter(cars, if(foo==0){speed>15}else{dist<50}))
[1] TRUE
> foo =1
> identical(filter(cars, dist<50), filter(cars, if(foo==0){speed>15}else{dist<50}))
[1] TRUE

It might just be easier and neater to put the filter statement inside the curly brackets:

if (foo == 0) {
  filter(cars, speed > 15)
} else {
  filter(cars, dist < 50)
}

Note if you want to assign the result somewhere, the if returns the value:

> ff = if (foo == 0) {
       filter(cars, speed > 15)
     } else {
       filter(cars, dist < 50)
     }
> identical(ff, filter(cars, speed>15))
[1] FALSE
> identical(ff, filter(cars, dist<50))
[1] TRUE
> foo
[1] 1
Spacedman
  • 92,590
  • 12
  • 140
  • 224
0

This works for me:

library(dplyr)

if (foo == 0) {
  test <- cars$speed > 15
} else {
  test <- cars$dist < 50
}

filter(cars, test)

I don't see a problem on using cars$speed and cars$dist just because you are using filter. Also, do you really need to use filter? There's an alternative to do this using base R. Replace the last line by:

cars[test,]
Paulo MiraMor
  • 1,582
  • 12
  • 30
  • Thanks! I agree that `cars[test,]` would be a better solution for this case - this was a simplistic example for the sake of a minimal code snippet for the question though. For some reason, this wasn't working for me when I tried to embed this in a function that takes in a df, column, and value and returned a filtered result set. But it actually seems to be working now using this sort of logic. – Steve Jan 25 '17 at 19:59