4

I want to subset df on an unknown condition (say, randomly defined as in the example below):

df <- data.frame(a=1:10, b = 10:1)
condition <- paste0(sample(letters[1:2],1), sample(c("<",">"),1), sample(1:10,1))

I can do this with eval, which, vox populi, is suboptimal:

subset(df, eval(parse(text=condition)))

Is there an alternative to eval(parse)?

InspectorSands
  • 2,859
  • 1
  • 18
  • 33
  • 1
    What do you mean by:`unknown condition`? – amonk Jul 04 '17 at 08:49
  • By `unknown condition`I guess it's a way to say : "I wrote a condition in a variable `condition` and then apply this condition on a dataset" ? – Mbr Mbr Jul 04 '17 at 08:52
  • 1
    "unknown" as is programmatically defined: impossible to know beforehand. That's what the `sample` was intended to signify. – InspectorSands Jul 04 '17 at 08:52
  • 3
    A tiny simplification would be `subset(df, eval(parse(text=condition)))` (though `subset` should only be used in interactive mode) – talat Jul 04 '17 at 08:56
  • 1
    worth reading: https://stackoverflow.com/a/40164111/4137985 – Cath Jul 04 '17 at 09:04
  • Thanks, @Cath. I did read it: that's why I am looking for alternatives to `parse`. Do you suggest an answer using `substitute()` or `quote()`? – InspectorSands Jul 04 '17 at 09:23
  • well tbh right now I don't know how but there must be a way ;-) – Cath Jul 04 '17 at 12:08

4 Answers4

2

With a slight adaptation to your script if becomes more feasible:

condition  <- list(value1 = sample(letters[1:2], 1),
                   comp =   sample(c(`<`, `>`), 1)[[1]],
                   value2 = sample(1:10, 1))

subset(df, condition$comp(df[, condition$value1], condition$value2))

So it depends on the constraints on how your condition is passed.

(Note that using subset might be a bad idea)

Axeman
  • 32,068
  • 8
  • 81
  • 94
  • So, how is the `a*a*a-sin(2*pi*b)<0` condition is covered? Are we talking about linear combination? – amonk Jul 04 '17 at 09:03
  • 1
    I am assuming OP used that method to make a reproducible example, in reality they might be getting the condition directly as string. Maybe OP can clarify. – Ronak Shah Jul 04 '17 at 09:03
  • @Axeman, that's really useful. Indeed, I provided the condition as a string just as an example. – InspectorSands Jul 04 '17 at 09:12
  • 1
    Actually, this doesn't work since `condition$comp(condition$value1, condition$value2)` returns a single boolean value. `subset(df, condition$comp(df[,condition$value1], condition$value2))` might be an alternative. – InspectorSands Jul 04 '17 at 10:17
  • @Axeman, I'll be happy to accept the answer if you change this to make it work. – InspectorSands Jul 05 '17 at 07:44
1

If there can be some constraints introduced, such as the dataframe only having numeric columns, and only linear conditions, you could formulate the decision on condition as dot products:

# a > b
condition.mat <- c(1, -1)
condition.const <- 0

# b > 4
# condition.mat <- c(0, 1)
# condition.const <- 4

dec <- as.matrix(df) %*% condition.mat - condition.const
sel <- dec > 0

print(df[sel,])
Surak of Vulcan
  • 348
  • 2
  • 11
  • Thanks, Surak. This is great to know. Unfortunately my example was misleading, since the columns of my df are actually character. – InspectorSands Jul 04 '17 at 09:15
1

An alternative to the base subset is the filter function from dplyr:

df <- data.frame(a=1:10, b = 10:1)
condition <- paste0(sample(letters[1:2],1), sample(c("<",">"),1), sample(1:10,1))

library(dplyr)
df %>% filter(eval(parse(text=condition))
HNSKD
  • 1,614
  • 2
  • 14
  • 25
1

Just one thought. There are other ways to keep the code dynamic without the (nasty) "character-expression", for example:

df <- data.frame(a=1:10, b = 10:1)

mysubset <- function (f,x1,x2) {
  df[f(df[[x1]],x2),]
}

mycol <- sample(letters[1:2],1) 
myfun <- sample(c("<",">"),1)
mylimit <- sample(1:10,1)

mysubset(.Primitive(myfun),mycol,mylimit) # in my mind just as dynamic as eval-parse ..

mysubset(`<`,"a",4) 
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39