3

I try to find a straight-forward way to vectorize/generalize the subsetting of a data.frame. Let's assume I have a data.frame:

df <- data.frame(A = 1:5, B = 10 * 1:5, C = 100 * 1:5)

Every column has its own condition and the goal is subset the df so that only those rows remain where the condition is met for at least one column. I now want to find a vectorized subset mechanism that generalizes

df <- subset(df, df[,1]<2 | df[,2]< 30 | df[,3]<100)

so I could formulate it somewhat like this

crit <- c(2,30,100)
df <- subset(df, df$header < crit[1:3])

and down the road I want to get to.

df <- subset(df, df$header < crit[1:n])

I know a multi-step loop workaround, but there must be another way. I am grateful for any help.

smci
  • 32,567
  • 20
  • 113
  • 146
Unit Root
  • 45
  • 4

2 Answers2

4

Given:

x <- c(1:5)
y <- c(10,20,30,40,50)
z <- c(100,200,300,400,500)

# df is a base function
mydf <- data.frame(A = x, B = y, C = z)

crit <- c(2,30,100)

Then this will let you see which values in the column are less than the crit value:

> sweep(mydf, 2, crit, "<")
         A     B     C
[1,]  TRUE  TRUE FALSE
[2,] FALSE  TRUE FALSE
[3,] FALSE FALSE FALSE
[4,] FALSE FALSE FALSE
[5,] FALSE FALSE FALSE

And this will give you the rows that meet any of the criteria:

> subset(mydf, rowSums(sweep(mydf, 2, crit, "<")) > 0)

  A  B   C
1 1 10 100
2 2 20 200
Phiala
  • 141
  • 2
1

This should also work

> df[apply(df, 1, function(x){any(x < crit)}), ]
  A  B   C
1 1 10 100
2 2 20 200
mbiron
  • 3,933
  • 1
  • 14
  • 16