9

I have a dataset a with 5 variables and want to filter it like this:

a1 <- a %>% filter(var_1 != 1 , var_2 != 1 , var_3 != 1 , var_4 != 1 , variable_5 != 1)

I was wondering if anything like this (pseudo code) existed:

a1 <- a %>% filter(anyvariable != 1)

In other words I would like to get rid of all the rows with value 1, no matter where it appears. 1 is just a random number. It could have been 9, 99, or whatever else! Thanks!

Gaspare
  • 155
  • 1
  • 1
  • 8
  • 1
    Your `|` condition implies you want to filter and keep rows where not all columns have a 1, but your wording says get rid of all rows with a 1 in any column position. Can you clarify? – Gopala May 21 '16 at 13:45
  • Hi Gopala, thanks. It should have been with commas rather than |. The wording is correct. – Gaspare May 21 '16 at 13:49
  • Did you meant to remove a row with any value 1 as your code is confusing – akrun May 21 '16 at 13:51
  • Hi guys, apologies. I want to remove all the rows with a 1 no matter where it appears. In other words, I want to remove all the rows with at least a 1. – Gaspare May 21 '16 at 13:52
  • Please check my update. I guess it should work for what you mentioned – akrun May 21 '16 at 13:53

6 Answers6

9

You can use filter_all in combination with all_vars from dplyr, as follows:

some_data <- tibble(var1 = c("a", "b", "c"),
                    var2 = c(2, 4, 1),
                    var3 = c(1, 6, 5))

# # A tibble: 3 x 3
#   var1   var2  var3
#   <chr> <dbl> <dbl>
# 1 a      2.00  1.00
# 2 b      4.00  6.00
# 3 c      1.00  5.00

some_data %>% filter_all(all_vars(. != 1))

# # A tibble: 1 x 3
#   var1   var2  var3
#   <chr> <dbl> <dbl>
# 1 b      4.00  6.00

This will remove rows in which a variable includes 1. In the above example, this removes the first and third rows. However, be cautious with NA values:

some_data <- tibble(var1 = c("a", "b", "c"),
                    var2 = c(2, NA, 1),
                    var3 = c(1, 6, 5))
# # A tibble: 3 x 3
#   var1   var2  var3
#   <chr> <dbl> <dbl>
# 1 a      2.00  1.00
# 2 b     NA     6.00
# 3 c      1.00  5.00

some_data %>% filter_all(all_vars(. != 1))  

# # A tibble: 0 x 3
# # ... with 3 variables: var1 <chr>, var2 <dbl>, var3 <dbl>

Note that the second row does not contain a 1, but is filtered anyway. In this specific example, you can avoid such behavior by:

some_data %>% filter_all(all_vars(. != 1 | is.na(.)))

However, this may not generalize well.

George Wood
  • 1,914
  • 17
  • 18
  • 1
    These answers still work, but the use of scoped variant verbs has been superseded by the use of `across`, `if_any`, and `if_all`. See my [answer](https://stackoverflow.com/questions/37363583/dplyr-filter-if-any-variable-is-equal-to-a-value/68218366#68218366) below for code that uses the new syntax. – Josh Jul 02 '21 at 00:46
6

We might be able to use rowSums

a %>% 
  filter(rowSums(. !=0) >0)
#    Col1 Col2
#1    1    1
#2    0   24
#3    9    1

If I change it to !=1

a %>% 
   filter(rowSums(. != 1) > 0)
#   Col1 Col2
#1    0   24
#2    9    1
#3    0    0

Note that this will remove the rows with all 1s. In the previous case, it removes the rows with all 0s which is consistent with what the OP mentioned in the post.

Update

If the OP wants to remove rows with any 1 (just a number, he can use 9, or 99, or 999)

a %>% 
   filter(!rowSums(.==1))
#    Col1 Col2
#1    0   24
#2    0    0

data

a <- data.frame(Col1 = c(1, 0, 9, 0), Col2 = c(1, 24, 1, 0))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I should have added that 1 is just a random value. It could have been 9 or 99 or whatever else. Thanks for the suggestion! – Gaspare May 21 '16 at 13:40
  • @Gaspare That is why I put it as `!=0` You can change it based on the value you have in mind – akrun May 21 '16 at 13:41
  • 1
    If there are NA's in the data, may also need to add na.rm=TRUE – dww May 21 '16 at 14:14
  • This works with the pipe, but not without it: > a %>% filter(!rowSums(.==1)) Col1 Col2 1 0 24 2 0 0 > filter(a,!rowSums(.==1)) Error: object '.' not found > Any suggestions? – jafelds Mar 16 '17 at 15:46
  • In `tidyverse` we are supposed to use pipes. or else it can be done in `base R` itself `a[rowSums(a!=0)>0,]` – akrun Mar 16 '17 at 17:22
2

Here are some convenient functions in the form OP requested:

filter_any <- function(...,test_val,na.rm=T)
{
      # JAF 20170316 filter by comparing test_val to any column, returning rows that have test_val in any column
      out <- ... %>% filter(!!rowSums(.==test_val,na.rm=na.rm))
      return(out)
}
filter_exclude <- function(...,test_val,na.rm=T)
{
      # JAF 20170316 filter by comparing test_val to every column, excluding rows that have test_val in any column
      out <- ... %>% filter(!rowSums(.==test_val,na.rm=na.rm))
      return(out)
}

Here is the result on OP's test variable:

> a
  Col1 Col2
1    1    1
2    0   24
3    9    1
4    0    0
> a %>% filter_exclude(test_val=1)
  Col1 Col2
1    0   24
2    0    0
> a %>% filter_any(test_val=1)
  Col1 Col2
1    1    1
2    9    1
>

These functions have the benefit of working without the pipe notation:

> filter_exclude(a,test_val=1)
  Col1 Col2
1    0   24
2    0    0
> filter_any(a,test_val=1)
  Col1 Col2
1    1    1
2    9    1
>
jafelds
  • 894
  • 8
  • 12
1

You can try to combine with the apply function in the pipeline:

dput(df)
structure(list(x = c(1L, 1L, 2L, 3L, 3L, 2L, 2L, 1L), y = c(1L, 
2L, 2L, 1L, 1L, 2L, 3L, 3L), z = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
3L)), .Names = c("x", "y", "z"), class = "data.frame", row.names = c(NA, 
-8L))

df %>% filter(!apply(., 1, function(row) any(row == 1)))
  x y z
1 2 2 2
2 2 3 2
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

There is no filter_each in dplyr, so a solution based on rowSums is a viable one. Posting this very simple base option although one may prefer a filter solution so as to incorporate the output into the dplyr pipeline with additional operations.

set.seed(1)
df <- data.frame(x = sample(0:1, 10, replace = TRUE),
                 y = sample(0:1, 10, replace = TRUE))
df[rowSums(df == 1) == 0, ]
  x y
1 0 0
2 0 0

Modifying 1 above to whatever value will make it work for filtering on other values. This solution is considerably faster than the apply based filter solution and marginally slower than dplyr package's filter with rowSums.

Gopala
  • 10,363
  • 7
  • 45
  • 77
1

@George Wood's answer works, but all_vars has been superseded by the use of if_all inside an existing verb, in this case filter instead of the scoped variant filter_all. @George Wood's answers can be updated by changing some_data %>% filter_all(all_vars(. to some_data %>% filter(if_all(.fns = ~ .x

Josh
  • 1,210
  • 12
  • 30