Suppose you have a data frame like this:
df <- data.frame(x = 1:10,
y = 96:105,
z = rep(c("A", "B", "C"), length.out = 10))
And you want to store a list of named predicates you can apply. You can do this simply by storing the logical vectors produced by the predicates in a list:
p <- list(x_is_even = df$x %% 2 == 0,
y_gt_100 = df$y > 100,
z_is_A = df$z == "A")
These can be combined like any other logical conditions:
subset(df, p$x_is_even & p$z_is_A & p$y_gt_100)
#> x y z
#> 10 10 105 A
If you want to do it in such a way that you can pass "bare" predicates (i.e. those that name columns without naming a data frame) then that is far harder.
The reason is that you would have to store the predicates as language objects. When you come to use these, it doesn't make sense to combine them with logical operators like &
or |
, because these operations are not defined for language objects.
It is possible, but it requires a bit of programming on the language. I realise you were hoping for something simple, but there is no way to do this simply in base R. I will show how it could be achieved and you can decide whether it is worth the trouble.
First you need a way of creating a list of quoted predicates:
make_subsets <- function(...)
{
as.list(match.call()[-1])
}
So you can do
p <- make_subsets(x_is_even = x %% 2 == 0,
y_gt_100 = y > 100,
z_is_A = z == "A")
p
#> $x_is_even
#> x%%2 == 0
#>
#> $y_gt_100
#> y > 100
#>
#> $z_is_A
#> z == "A"
Now you need to be able to build these together arbitrarily into a call:
parse_subsets <- function(expr)
{
expr <- as.list(match.call()$expr)
this_call <- as.character(expr[[1]])
if(this_call == "&" | this_call == "|")
{
l <- unlist(lapply(expr[-1], function(x) {
eval(as.call(list(parse_subsets, x)))}))
as.call(append(l, as.symbol(this_call), 0))
}
else return(eval(as.call(expr)))
}
And now you need a function that can take your predicates and filter data:
subset2 <- function(data, subsets)
{
ss <- match.call()$subsets
ss <- eval(as.call(list(parse_subsets, ss)))
eval(as.call(list(subset, data, ss)))
}
So now you can do
subset2(df, p$x_is_even & p$y_gt_100 & p$z_is_A)
#> x y z
#> 10 10 105 A
Note, howver, if you want to use lapply
on this, you will need to do it the long way:
lapply(list(df, df), function(x) subset2(x, p$x_is_even & p$y_gt_100 & p$z_is_A))
#> [[1]]
#> x y z
#> 10 10 105 A
#>
#> [[2]]
#> x y z
#> 10 10 105 A