Dynamic subset condition in R

Question

I'm trying to implement a function which takes a dynamic subset based on a list of column names of any length

The static code is:

s <- c("s0","s1","s2")
d.subset <- d[ d$s0 > 0 | d$s1 > 0 | d$s2 > 0,]

However, I want to generate the d$s0 > 0 | d$s1 > 0 | d$s2 > 0 part based on s. I tried as.formula() for generating it, but it gave me an "invalid formula" error.

score 5 · Accepted Answer · answered Jan 17 '13 at 16:33

5

An example data frame:

d <- data.frame(s0 = c(0,1,0,0), s1 = c(1,1,1,0), s2 = c(0,1,1,0))

s <- c("s0","s1","s2")

Here is an easy solution with rowSums:

d[as.logical(rowSums(d[s] > 0)), ]

The result:

answered Jan 17 '13 at 16:33

Sven Hohenstein

80,497
17
145
168

@SvenHohenstein, hmm, not sure I get you. `rowSums(d[s])` would give you `c(1, 3, 2, 0)`, and `rowSums(d[s]) > 0` would give you `c(TRUE, TRUE, TRUE, FALSE)`, a logical value for each row determined by whether any of the columns are > 0. – Matthew Plourde Jan 17 '13 at 16:45
@MatthewPlourde Sorry, this was a mistake. I played around with some examples. Of course, the result of `rowSums` here is `c(1, 3, 2, 0)`. Now, I also realised the different position of the brackets in your comment. Agreed, a good idea as long as the data frame contains non-negative values only. – Sven Hohenstein Jan 17 '13 at 16:51
@SvenHohenstein The length of the value returned by `rowSums` will always be equal to the number of rows in the `data.frame`.... – Matthew Plourde Jan 17 '13 at 16:52

score 2 · Answer 2 · edited May 23 '17 at 12:22

2

You're code isn't reproducible so this is a shot in the dark at what you want I think you want to use indexing rather than the $ operator:

s <- c("s0","s1","s2")
d.subset <- d[ d[, s[1]] > 0 | d[, s[2]] > 0 | d[, s[3]] > 0,]

edited May 23 '17 at 12:22

Community

1
1

answered Jan 17 '13 at 16:30

Tyler Rinker

108,132
65
322
519

score 1 · Answer 3 · answered Jul 18 '14 at 01:17

Inspired by the answer from @sven-hohenstein here is a generalised function that will filter based on a list of predicates, specified in the form column=list(binary_operator, rhs) (e.g. x=list(`<=`, 3) for x <= 3).

#' Filter a data frame dynamically
#'
#' @param df data frame to filter
#' @param controls list of filters (with optional operators)
filter_data = function(df, controls) {

  evaluate = function(predicate, value) {
    if (is.list(predicate)) {
      operator = predicate[[1L]]
      rhs = predicate[[2L]]
    } else {
      operator = `==`
      rhs = predicate
    }
    return(operator(value, rhs))
  }

  index = apply(
    mapply(evaluate, predicate=controls, value=df[names(controls)]), 1L, all
  )

  return(df[index, ])

}

Here is an example using the filtering function to apply the condition x == 2 & y <= 2.5 & z != 'C':

# create example data
df = data.frame(
  x=sample(1:3, 100L, TRUE),
  y=runif(100L, 1, 5),
  z=sample(c('A', 'B', 'C'), 100L, TRUE)
)

controls = list(x=2L, y=list(`<=`, 2.5), z=list(`!=`, 'C'))

filter_data(df, controls)

score 0 · Answer 4 · edited May 23 '17 at 11:51

0

(EDIT: This solution is strongly not recommended. Please see comments and this Stack Overflow question for details.)

I just learned this trick: write it all as a character string and use eval(parse(text=. Perhaps not the best thing for this example but it can be used more generally.

s <- c("s0","s1","s2")
s.1 <- paste0("d$",s," > 0",collapse=" | ")
d.subset <- eval(parse(text=paste0("d[",s.1,",]")))

edited May 23 '17 at 11:51

Community

1
1

answered Jan 17 '13 at 16:48

Blue Magister

13,044
5
38
56

2

-1 this is hideous. to quote R core contributor Thomas Lumley, "If the answer is parse() you should usually rethink the question". addtional reading: http://stackoverflow.com/questions/13649979/what-specifically-are-the-dangers-of-evalparse – Matthew Plourde Jan 17 '13 at 17:01
Thanks - I got this off of an old data.table question without knowing any of the pitfalls. I've learned something new today. – Blue Magister Jan 17 '13 at 21:10
if you feel like adding anything, I'd be glad to remove the down vote. – Matthew Plourde Jan 17 '13 at 21:27
Something like "EDIT: This is never recommended"? – Blue Magister Jan 17 '13 at 21:55

Dynamic subset condition in R

4 Answers4