How to filter data without losing NA rows using dplyr

Question

How to subset data in R without losing NA rows?

The post above subsets using logical indexing. Is there a way to do it in dplyr?

Also, when does dplyr automatically delete NAs? In my experience, it removes NA when I filter out a specific string, eg:

b = a %>% filter(col != "str")

I would think this would not exclude NA values but it does. But when I use other format of filtering, it does not automatically exclude NA, eg:

b = a %>% filter(!grepl("str", col))

I would like to understand this feature of filter. I would appreciate any help. Thank you!

Here is the package's author comment: https://github.com/tidyverse/dplyr/issues/3196#issuecomment-342599352 — Ruam Pimentel, Aug 23 '23 at 01:56

score 38 · Accepted Answer · answered Sep 23 '17 at 10:26

38

The documentation for dplyr::filter says... "Unlike base subsetting, rows where the condition evaluates to NA are dropped."

NA != "str" evaluates to NA so is dropped by filter.

!grepl("str", NA) returns TRUE, so is kept.

If you want filter to keep NA, you could do filter(is.na(col)|col!="str")

answered Sep 23 '17 at 10:26

Andrew Gustar

17,295
1
22
32

2

This solution won't do since if col != "str" returns FALSE but is.na(col) returns TRUE, it will be kept. So the filter fails. I'm actually using both filter examples in one filter. So it goes b = a %>% filter(col != "str1", !grepl("str2", col)). This is what I do but it also filters out NA...that's the problem – Brent Carbonera Sep 25 '17 at 07:58

qwr · Answer 2 · 2023-02-09T21:36:23.767

16

If you want to keep NAs created by the filter condition you can simply turn the condition NAs into TRUEs using replace_na from tidyr.

a <- data.frame(col = c("hello", NA, "str"))
a %>% filter((col != "str") %>% replace_na(TRUE))

edited Feb 09 '23 at 21:36

answered Jun 19 '19 at 19:59

qwr

9,525
5
58
102

This is the only solution that actually worked for me! Very elegant too! Thank you!! – James Cutler Feb 11 '22 at 16:30

score 2 · Answer 3 · answered Dec 10 '22 at 22:18

2

I just ran into this issue. It is very easy to miss, and I must say I find this behavior somewhat unintuitive. Based on qwr's answer, this is becoming a staple in all my projects from now on:

filter_na <- function(tbl, expr){
   tbl %>% filter({{expr}} %>% replace_na(T))
}

answered Dec 10 '22 at 22:18

user11130854

333
2
9

The behavior matches SQL WHERE in that it drops UNKNOWN in SQL's 3VL. – qwr Aug 23 '23 at 02:45

How to filter data without losing NA rows using dplyr

3 Answers3

Linked

Related