Selecting Rows with Missing Data in a Range of Columns

Question

There are several ways to identify and manipulate individual cells with missing data in R, e.g., with complete.cases or even rowSums.

However, I've not been able to find---or figure out myself---an expedient way to select rows that have missing data within a subsetted range of columns.

For example, in dataframe df:

df <- data.frame(D1 = c('A', 'B', 'C', 'D'),
                 D2 = c(NA, 0, 1, 1),
                 V1 = c(11, NA, 33, NA),
                 V2 = c(111, 222, NA, NA)
                 )
df

# D1  D2  V1  V2
#  A  NA  11 111    
#  B   0  NA 222    
#  C   1  33  NA    
#  D   1  NA  NA

I would like to select all rows that have missing data in both columns V1 and V2, thus selecting row D but not rows B or C (or A).

I have a larger range of columns than given in that toy example, so selecting a set of columns with, e.g., && could make for a long command.

N.B., a similar SO question addresses selecting rows where none are NAs.

score 3 · Answer 1 · answered Sep 04 '22 at 07:27

3

You can try this:

df %>% filter(is.na(V1) & is.na(V2))

OUTPUT

  D1 D2 V1 V2
1  D  1 NA NA

answered Sep 04 '22 at 07:27

Isaac

382
2
7

Thank you (+1). Any ideas on selecting a range of columns as opposed to each column via `&`? – wes Sep 04 '22 at 07:39

Maël · Accepted Answer · 2022-09-04T08:51:36.750

You can use dplyr::if_all. You can select the columns very flexibly with tidyselect, for instance using :, c, starts_with...

library(dplyr)
df %>% 
  filter(if_all(V1:V2, is.na))

#  D1 D2 V1 V2
#1  D  1 NA NA

Also works (this shows the flexibility of tidyselect):

filter(df, if_all(3:4, is.na))
filter(df, if_all(starts_with("V"), is.na))
filter(df, if_all(c(V1, V2), is.na))
filter(df, if_all((last_col()-1):last_col(), is.na))
filter(df, if_all(num_range("V", 1:2), is.na))

Selecting Rows with Missing Data in a Range of Columns

2 Answers2