Dplyr - Filter if any variable is equal to a value

Question

I have a dataset a with 5 variables and want to filter it like this:

a1 <- a %>% filter(var_1 != 1 , var_2 != 1 , var_3 != 1 , var_4 != 1 , variable_5 != 1)

I was wondering if anything like this (pseudo code) existed:

a1 <- a %>% filter(anyvariable != 1)

In other words I would like to get rid of all the rows with value 1, no matter where it appears. 1 is just a random number. It could have been 9, 99, or whatever else! Thanks!

Your `|` condition implies you want to filter and keep rows where not all columns have a 1, but your wording says get rid of all rows with a 1 in any column position. Can you clarify? — Gopala, May 21 '16 at 13:45
Hi Gopala, thanks. It should have been with commas rather than |. The wording is correct. — Gaspare, May 21 '16 at 13:49
Did you meant to remove a row with any value 1 as your code is confusing — akrun, May 21 '16 at 13:51
Hi guys, apologies. I want to remove all the rows with a 1 no matter where it appears. In other words, I want to remove all the rows with at least a 1. — Gaspare, May 21 '16 at 13:52
Please check my update. I guess it should work for what you mentioned — akrun, May 21 '16 at 13:53

George Wood · Answer 1 · 2018-06-01T17:17:15.467

You can use filter_all in combination with all_vars from dplyr, as follows:

some_data <- tibble(var1 = c("a", "b", "c"),
                    var2 = c(2, 4, 1),
                    var3 = c(1, 6, 5))

# # A tibble: 3 x 3
#   var1   var2  var3
#   <chr> <dbl> <dbl>
# 1 a      2.00  1.00
# 2 b      4.00  6.00
# 3 c      1.00  5.00

some_data %>% filter_all(all_vars(. != 1))

# # A tibble: 1 x 3
#   var1   var2  var3
#   <chr> <dbl> <dbl>
# 1 b      4.00  6.00

This will remove rows in which a variable includes 1. In the above example, this removes the first and third rows. However, be cautious with NA values:

some_data <- tibble(var1 = c("a", "b", "c"),
                    var2 = c(2, NA, 1),
                    var3 = c(1, 6, 5))
# # A tibble: 3 x 3
#   var1   var2  var3
#   <chr> <dbl> <dbl>
# 1 a      2.00  1.00
# 2 b     NA     6.00
# 3 c      1.00  5.00

some_data %>% filter_all(all_vars(. != 1))  

# # A tibble: 0 x 3
# # ... with 3 variables: var1 <chr>, var2 <dbl>, var3 <dbl>

Note that the second row does not contain a 1, but is filtered anyway. In this specific example, you can avoid such behavior by:

some_data %>% filter_all(all_vars(. != 1 | is.na(.)))

However, this may not generalize well.

These answers still work, but the use of scoped variant verbs has been superseded by the use of `across`, `if_any`, and `if_all`. See my [answer](https://stackoverflow.com/questions/37363583/dplyr-filter-if-any-variable-is-equal-to-a-value/68218366#68218366) below for code that uses the new syntax. — Josh, Jul 02 '21 at 00:46

akrun · Answer 2 · 2016-05-21T13:50:03.777

6

We might be able to use rowSums

a %>% 
  filter(rowSums(. !=0) >0)
#    Col1 Col2
#1    1    1
#2    0   24
#3    9    1

If I change it to !=1

a %>% 
   filter(rowSums(. != 1) > 0)
#   Col1 Col2
#1    0   24
#2    9    1
#3    0    0

Note that this will remove the rows with all 1s. In the previous case, it removes the rows with all 0s which is consistent with what the OP mentioned in the post.

Update

If the OP wants to remove rows with any 1 (just a number, he can use 9, or 99, or 999)

a %>% 
   filter(!rowSums(.==1))
#    Col1 Col2
#1    0   24
#2    0    0

data

a <- data.frame(Col1 = c(1, 0, 9, 0), Col2 = c(1, 24, 1, 0))

edited May 21 '16 at 13:50

answered May 21 '16 at 13:33

akrun

874,273
37
540
662

I should have added that 1 is just a random value. It could have been 9 or 99 or whatever else. Thanks for the suggestion! – Gaspare May 21 '16 at 13:40
@Gaspare That is why I put it as `!=0` You can change it based on the value you have in mind – akrun May 21 '16 at 13:41
1

If there are NA's in the data, may also need to add na.rm=TRUE – dww May 21 '16 at 14:14
This works with the pipe, but not without it: > a %>% filter(!rowSums(.==1)) Col1 Col2 1 0 24 2 0 0 > filter(a,!rowSums(.==1)) Error: object '.' not found > Any suggestions? – jafelds Mar 16 '17 at 15:46
In `tidyverse` we are supposed to use pipes. or else it can be done in `base R` itself `a[rowSums(a!=0)>0,]` – akrun Mar 16 '17 at 17:22

jafelds · Answer 3 · 2017-03-16T16:24:38.810

Here are some convenient functions in the form OP requested:

filter_any <- function(...,test_val,na.rm=T)
{
      # JAF 20170316 filter by comparing test_val to any column, returning rows that have test_val in any column
      out <- ... %>% filter(!!rowSums(.==test_val,na.rm=na.rm))
      return(out)
}
filter_exclude <- function(...,test_val,na.rm=T)
{
      # JAF 20170316 filter by comparing test_val to every column, excluding rows that have test_val in any column
      out <- ... %>% filter(!rowSums(.==test_val,na.rm=na.rm))
      return(out)
}

Here is the result on OP's test variable:

> a
  Col1 Col2
1    1    1
2    0   24
3    9    1
4    0    0
> a %>% filter_exclude(test_val=1)
  Col1 Col2
1    0   24
2    0    0
> a %>% filter_any(test_val=1)
  Col1 Col2
1    1    1
2    9    1
>

These functions have the benefit of working without the pipe notation:

> filter_exclude(a,test_val=1)
  Col1 Col2
1    0   24
2    0    0
> filter_any(a,test_val=1)
  Col1 Col2
1    1    1
2    9    1
>

score 1 · Answer 4 · answered May 21 '16 at 13:49

1

You can try to combine with the apply function in the pipeline:

dput(df)
structure(list(x = c(1L, 1L, 2L, 3L, 3L, 2L, 2L, 1L), y = c(1L, 
2L, 2L, 1L, 1L, 2L, 3L, 3L), z = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
3L)), .Names = c("x", "y", "z"), class = "data.frame", row.names = c(NA, 
-8L))

df %>% filter(!apply(., 1, function(row) any(row == 1)))
  x y z
1 2 2 2
2 2 3 2

answered May 21 '16 at 13:49

Psidom

209,562
33
339
356

This is going to be slower vs. even base subsetting on `rowSums`. – Gopala May 21 '16 at 13:59
I agree that `rowSum()` is a much better way to approach the problem. – Psidom May 21 '16 at 14:15

score 1 · Answer 5 · answered May 21 '16 at 14:06

There is no filter_each in dplyr, so a solution based on rowSums is a viable one. Posting this very simple base option although one may prefer a filter solution so as to incorporate the output into the dplyr pipeline with additional operations.

set.seed(1)
df <- data.frame(x = sample(0:1, 10, replace = TRUE),
                 y = sample(0:1, 10, replace = TRUE))
df[rowSums(df == 1) == 0, ]
  x y
1 0 0
2 0 0

Modifying 1 above to whatever value will make it work for filtering on other values. This solution is considerably faster than the apply based filter solution and marginally slower than dplyr package's filter with rowSums.

score 1 · Answer 6 · answered Jul 02 '21 at 00:43

@George Wood's answer works, but all_vars has been superseded by the use of if_all inside an existing verb, in this case filter instead of the scoped variant filter_all. @George Wood's answers can be updated by changing some_data %>% filter_all(all_vars(. to some_data %>% filter(if_all(.fns = ~ .x

Dplyr - Filter if any variable is equal to a value

6 Answers6

Update

data