Here is a faster method than apply
using max.col
, matrix subsetting, and logical subsetting.
First, construct a sample dataset.
set.seed(1234)
dat <- data.frame(a=sample(1:3, 5, replace=TRUE),
b=sample(1:4, 5, replace=TRUE),
c=sample(1:6, 5, replace=TRUE))
It looks like this.
dat
a b c
1 1 3 5
2 2 1 4
3 2 1 2
4 2 3 6
5 3 3 2
Notice that only the third column has values greater than 4 and that only 2 such elements in the column pass the test. Now, we do
dat[dat[cbind(seq_along(dat[[1]]), max.col(dat))] > 4, ]
a b c
1 1 3 5
4 2 3 6
Here, max.col(dat)
returns the column with the maximum value for each row. seq_along(dat[[1]])
runs through the row numbers. cbind
returns a matrix that we use to pull out the maximum value for each row using matrix subsetting. Then, compare these values to see if any are greater than 4 with > 4
, which returns a logical vector whose length is the number of rows of the data.frame. This is used to subset the data.frame by row.