Applying vectorized subsetting across multiple columns in R

Question

I try to find a straight-forward way to vectorize/generalize the subsetting of a data.frame. Let's assume I have a data.frame:

df <- data.frame(A = 1:5, B = 10 * 1:5, C = 100 * 1:5)

Every column has its own condition and the goal is subset the df so that only those rows remain where the condition is met for at least one column. I now want to find a vectorized subset mechanism that generalizes

df <- subset(df, df[,1]<2 | df[,2]< 30 | df[,3]<100)

so I could formulate it somewhat like this

crit <- c(2,30,100)
df <- subset(df, df$header < crit[1:3])

and down the road I want to get to.

df <- subset(df, df$header < crit[1:n])

I know a multi-step loop workaround, but there must be another way. I am grateful for any help.

Side note: `df <- data.frame(A = 1:5, B = 10 * 1:5, C = 100 * 1:5)` is easier than using 5 lines to create your data frame ;-) — Rich Scriven, Jan 15 '16 at 22:16
thank you so much, I was so far down the subset rabbit hole I forgot about mapply! — Unit Root, Jan 15 '16 at 22:29

score 4 · Accepted Answer · answered Jan 15 '16 at 22:22

Given:

x <- c(1:5)
y <- c(10,20,30,40,50)
z <- c(100,200,300,400,500)

# df is a base function
mydf <- data.frame(A = x, B = y, C = z)

crit <- c(2,30,100)

Then this will let you see which values in the column are less than the crit value:

> sweep(mydf, 2, crit, "<")
         A     B     C
[1,]  TRUE  TRUE FALSE
[2,] FALSE  TRUE FALSE
[3,] FALSE FALSE FALSE
[4,] FALSE FALSE FALSE
[5,] FALSE FALSE FALSE

And this will give you the rows that meet any of the criteria:

> subset(mydf, rowSums(sweep(mydf, 2, crit, "<")) > 0)

  A  B   C
1 1 10 100
2 2 20 200

works too, the mapply solution by rawr is a little faster, at least on my machine — Unit Root, Jan 15 '16 at 22:31

mbiron · Answer 2 · 2016-01-16T01:58:56.953

1

This should also work

> df[apply(df, 1, function(x){any(x < crit)}), ]
  A  B   C
1 1 10 100
2 2 20 200

edited Jan 16 '16 at 01:58

answered Jan 15 '16 at 22:21

mbiron

3,933
1
14
16

Applying vectorized subsetting across multiple columns in R

2 Answers2