Subset df on string condition

Question

I want to subset df on an unknown condition (say, randomly defined as in the example below):

df <- data.frame(a=1:10, b = 10:1)
condition <- paste0(sample(letters[1:2],1), sample(c("<",">"),1), sample(1:10,1))

I can do this with eval, which, vox populi, is suboptimal:

subset(df, eval(parse(text=condition)))

Is there an alternative to eval(parse)?

By `unknown condition`I guess it's a way to say : "I wrote a condition in a variable `condition` and then apply this condition on a dataset" ? — Mbr Mbr, Jul 04 '17 at 08:52
"unknown" as is programmatically defined: impossible to know beforehand. That's what the `sample` was intended to signify. — InspectorSands, Jul 04 '17 at 08:52
A tiny simplification would be `subset(df, eval(parse(text=condition)))` (though `subset` should only be used in interactive mode) — talat, Jul 04 '17 at 08:56
Thanks, @Cath. I did read it: that's why I am looking for alternatives to `parse`. Do you suggest an answer using `substitute()` or `quote()`? — InspectorSands, Jul 04 '17 at 09:23
well tbh right now I don't know how but there must be a way ;-) — Cath, Jul 04 '17 at 12:08

Axeman · Accepted Answer · 2017-07-05T08:38:57.420

2

With a slight adaptation to your script if becomes more feasible:

condition  <- list(value1 = sample(letters[1:2], 1),
                   comp =   sample(c(`<`, `>`), 1)[[1]],
                   value2 = sample(1:10, 1))

subset(df, condition$comp(df[, condition$value1], condition$value2))

So it depends on the constraints on how your condition is passed.

(Note that using subset might be a bad idea)

edited Jul 05 '17 at 08:38

answered Jul 04 '17 at 09:00

Axeman

32,068
8
81
94

So, how is the `a*a*a-sin(2*pi*b)<0` condition is covered? Are we talking about linear combination? – amonk Jul 04 '17 at 09:03
1

I am assuming OP used that method to make a reproducible example, in reality they might be getting the condition directly as string. Maybe OP can clarify. – Ronak Shah Jul 04 '17 at 09:03
@Axeman, that's really useful. Indeed, I provided the condition as a string just as an example. – InspectorSands Jul 04 '17 at 09:12
1

Actually, this doesn't work since `condition$comp(condition$value1, condition$value2)` returns a single boolean value. `subset(df, condition$comp(df[,condition$value1], condition$value2))` might be an alternative. – InspectorSands Jul 04 '17 at 10:17
@Axeman, I'll be happy to accept the answer if you change this to make it work. – InspectorSands Jul 05 '17 at 07:44

score 1 · Answer 2 · answered Jul 04 '17 at 09:04

1

If there can be some constraints introduced, such as the dataframe only having numeric columns, and only linear conditions, you could formulate the decision on condition as dot products:

# a > b
condition.mat <- c(1, -1)
condition.const <- 0

# b > 4
# condition.mat <- c(0, 1)
# condition.const <- 4

dec <- as.matrix(df) %*% condition.mat - condition.const
sel <- dec > 0

print(df[sel,])

answered Jul 04 '17 at 09:04

Surak of Vulcan

348
2
11

Thanks, Surak. This is great to know. Unfortunately my example was misleading, since the columns of my df are actually character. – InspectorSands Jul 04 '17 at 09:15

score 1 · Answer 3 · answered Jul 04 '17 at 09:25

1

An alternative to the base subset is the filter function from dplyr:

df <- data.frame(a=1:10, b = 10:1)
condition <- paste0(sample(letters[1:2],1), sample(c("<",">"),1), sample(1:10,1))

library(dplyr)
df %>% filter(eval(parse(text=condition))

answered Jul 04 '17 at 09:25

HNSKD

1,614
2
14
25

2

I was actually looking to bypass `eval(parse)` altogether. – InspectorSands Jul 04 '17 at 09:27

score 1 · Answer 4 · answered Jul 04 '17 at 10:26

Just one thought. There are other ways to keep the code dynamic without the (nasty) "character-expression", for example:

df <- data.frame(a=1:10, b = 10:1)

mysubset <- function (f,x1,x2) {
  df[f(df[[x1]],x2),]
}

mycol <- sample(letters[1:2],1) 
myfun <- sample(c("<",">"),1)
mylimit <- sample(1:10,1)

mysubset(.Primitive(myfun),mycol,mylimit) # in my mind just as dynamic as eval-parse ..

mysubset(`<`,"a",4)

Subset df on string condition

4 Answers4