How to subset specific values from whole data.frame without defining every column?

Question

My data.frame (df) consists of 20 different columns.
All my columns are integer values (range 0 - 99)

Let's say I would like to subset my data of col1 and col2 that have lower values(x) than 4.
So my code could be:

df2 <- subset(df, col1 < 4 & col2 < 4)

That's fine.

But how can I modify my code to get a new subset of all my 20 columns. without specifying every particular column. ?

Thanks for your help!

It would be easier to help you if you provided a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data that can be used to test and verify possible solutions. — MrFlick, Aug 28 '17 at 18:09
@MrFlick correct -> I'm looking for an efficient way to code cases like that from my example — , Aug 28 '17 at 18:16

score 2 · Answer 1 · answered Aug 28 '17 at 18:20

2

df2 <- df[apply(df, 1, max) < 4,]

answered Aug 28 '17 at 18:20

dvantwisk

561
3
11

lmo · Accepted Answer · 2017-08-29T11:43:59.173

Here is a faster method than apply using max.col, matrix subsetting, and logical subsetting. First, construct a sample dataset.

set.seed(1234)
dat <- data.frame(a=sample(1:3, 5, replace=TRUE),
                  b=sample(1:4, 5, replace=TRUE),
                  c=sample(1:6, 5, replace=TRUE))

It looks like this.

Notice that only the third column has values greater than 4 and that only 2 such elements in the column pass the test. Now, we do

dat[dat[cbind(seq_along(dat[[1]]), max.col(dat))] > 4, ]
  a b c
1 1 3 5
4 2 3 6

Here, max.col(dat) returns the column with the maximum value for each row. seq_along(dat[[1]]) runs through the row numbers. cbind returns a matrix that we use to pull out the maximum value for each row using matrix subsetting. Then, compare these values to see if any are greater than 4 with > 4, which returns a logical vector whose length is the number of rows of the data.frame. This is used to subset the data.frame by row.

great alternative! thank you very much for your answer and help! :) — , Aug 29 '17 at 07:43
@dvantwisk I upvoted your answer because it looks cleaner, but `apply` is notoriously slow, especially for data.frames. It is essentially a fancy wrapper for a `for` loop that has to convert the data.frame to a matrix prior to running that loop. `max.col`, while still requiring matrix conversion, is optimized under the hood, and matrix indexing (subsetting) is usually incredibly fast. — lmo, Aug 29 '17 at 14:08

How to subset specific values from whole data.frame without defining every column?

2 Answers2