drop levels of factor for which there is one missing value for one column r

Question

I would like to drop any occurrence of a factor level for which one row contains a missing value

Example:

ID var1 var2
1  1    2
1  NA   3
2  1    2
2  2    4

So, in this hypothetical, what would be left would be:

ID var1 var2
2  1    2
2  2    4

score 4 · Answer 1 · answered Dec 18 '14 at 19:59

Hers's possible data.table solution (sorry @rawr)

library(data.table)
setDT(df)[, if (all(!is.na(.SD))) .SD, ID]
#    ID var1 var2
# 1:  2    1    2
# 2:  2    2    4

If you only want to check var1 then

df[, if (all(!is.na(var1))) .SD, ID]
#    ID var1 var2
# 1:  2    1    2
# 2:  2    2    4

akrun · Answer 2 · 2014-12-18T20:06:28.583

Assuming that NAs would occur in both var columns,

 df[with(df, !ave(!!rowSums(is.na(df[,-1])), ID, FUN=any)),]
 #   ID var1 var2
 #3  2    1    2
 #4  2    2    4

Or if it is only specific to var1

 df[with(df, !ave(is.na(var1), ID, FUN=any)),]
 #  ID var1 var2
 #3  2    1    2
 #4  2    2    4

Or using dplyr

 library(dplyr)
 df %>% 
     group_by(ID) %>%
     filter(all(!is.na(var1)))
 #   ID var1 var2
 #1  2    1    2
 #2  2    2    4

data

 df <- structure(list(ID = c(1L, 1L, 2L, 2L), var1 = c(1L, NA, 1L, 2L
 ), var2 = c(2L, 3L, 2L, 4L)), .Names = c("ID", "var1", "var2"
 ), class = "data.frame", row.names = c(NA, -4L))

talat · Answer 3 · 2014-12-18T21:42:35.930

3

Here's one more option in base R. It will check all columns for NAs.

df[!df$ID %in% df$ID[rowSums(is.na(df)) > 0],]
#  ID var1 var2
#3  2    1    2
#4  2    2    4

If you only want to check in column "var1" you can do:

df[!with(df, ID %in% ID[is.na(var1)]),]
#  ID var1 var2
#3  2    1    2
#4  2    2    4

edited Dec 18 '14 at 21:42

answered Dec 18 '14 at 20:19

talat

68,970
21
126
157

score 2 · Answer 4 · edited May 23 '17 at 10:26

In the current development version of data.table, there's a new implementation of na.omit for data.tables, which takes a cols =and invert = arguments.

The cols = allows to specify the columns on which to look for NAs. And invert = TRUE returns the NA rows instead, instead of omitting them.

You can install the devel version by following these instructions. Or you can wait for 1.9.6 on CRAN at some point. Using that, we can do:

require(data.table) ## 1.9.5+
setkey(setDT(df), ID)
df[!na.omit(df, invert = TRUE)]
#    ID var1 var2
# 1:  2    1    2
# 2:  2    2    4

How this works:

setDT converts data.frame to data.table by reference.
setkey sorts the data.table by the columns provided and marks those columns as key columns so that we can perform a join.
na.omit(df, invert = TRUE) gives just those rows that have NA anywhere.
X[!Y] does an anit-join by joining on the key column ID, and returns all the rows that don't match ID = 1 (from Y). Check this post to read in detail about data.table's joins.

HTH

drop levels of factor for which there is one missing value for one column r

4 Answers4

data