26

How to subset data in R without losing NA rows?

The post above subsets using logical indexing. Is there a way to do it in dplyr?

Also, when does dplyr automatically delete NAs? In my experience, it removes NA when I filter out a specific string, eg:

b = a %>% filter(col != "str")

I would think this would not exclude NA values but it does. But when I use other format of filtering, it does not automatically exclude NA, eg:

b = a %>% filter(!grepl("str", col))

I would like to understand this feature of filter. I would appreciate any help. Thank you!

Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
Brent Carbonera
  • 448
  • 2
  • 5
  • 9

3 Answers3

38

The documentation for dplyr::filter says... "Unlike base subsetting, rows where the condition evaluates to NA are dropped."

NA != "str" evaluates to NA so is dropped by filter.

!grepl("str", NA) returns TRUE, so is kept.

If you want filter to keep NA, you could do filter(is.na(col)|col!="str")

Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
  • 2
    This solution won't do since if col != "str" returns FALSE but is.na(col) returns TRUE, it will be kept. So the filter fails. I'm actually using both filter examples in one filter. So it goes b = a %>% filter(col != "str1", !grepl("str2", col)). This is what I do but it also filters out NA...that's the problem – Brent Carbonera Sep 25 '17 at 07:58
16

If you want to keep NAs created by the filter condition you can simply turn the condition NAs into TRUEs using replace_na from tidyr.

a <- data.frame(col = c("hello", NA, "str"))
a %>% filter((col != "str") %>% replace_na(TRUE))
qwr
  • 9,525
  • 5
  • 58
  • 102
2

I just ran into this issue. It is very easy to miss, and I must say I find this behavior somewhat unintuitive. Based on qwr's answer, this is becoming a staple in all my projects from now on:

filter_na <- function(tbl, expr){
   tbl %>% filter({{expr}} %>% replace_na(T))
}
user11130854
  • 333
  • 2
  • 9