Removing specific rows with some NA values in a data frame

Question

I want to remove rows from a data frame where a column has NA only if the other rows where the NA value is found matches others value in the data frame

For example,

df <- data.frame(ID = c(1,1,2,2),DAY=c(1,1,2,3), VAL=c(1,NA,NA,5))

I want to remove the second row because there is a missing value in VAL and there is already a value for VAL with ID = 1 and DAY = 1 to get

ID      DAY     VAL
1        1        1
2        2        NA
2        3        5

Any idea how to do this? I could try writing a loop, but that doesn't seem efficient.

possible duplicate of [How to remove "rows" with a NA value?](http://stackoverflow.com/questions/6138776/how-to-remove-rows-with-a-na-value) — MichaelChirico, Jul 28 '15 at 20:26
@MichaelChirico I didn't find it as a duplicate because `na.omit` is more general and it omits all the rows with the `NA` values. But, here the OP wants to remove NA values selectively by group. — akrun, Jul 28 '15 at 20:33

akrun · Accepted Answer · 2015-07-28T20:22:00.463

3

This could be done by creating a logical condition within each grouping variable. Here, I am guessing 'ID' and 'DAY' as the grouping variable.

One option is using data.table. We convert the 'data.frame' to 'data.table' (setDT(df)). Grouped by 'ID', 'DAY', we get the row index (.I) of 'VAL' that satisfy the condition (sum(is.na(VAL))!= .N & is.na(VAL)) and remove the that row index from the dataset 'df'. The sum(is.na(VAL))!= .N gives a logical vector that checks whether the number of NA values in a group is not equal to number of rows in that group (.N). If the 'VAL' is also an 'NA' along with the previous condition, then that is removed.

library(data.table)
i1 <- setDT(df)[, .I[sum(is.na(VAL))!=.N & is.na(VAL)] , by = .(ID, DAY)]$V1
df[-i1]
#   ID DAY VAL
#1:  1   1   1
#2:  2   2  NA
#3:  2   3   5

Or a similar option with dplyr. We group by 'ID', DAY', and then use filter with the above mentioned condition.

library(dplyr)
df %>% 
   group_by(ID, DAY) %>% 
   filter(!(sum(is.na(VAL))!=n() & is.na(VAL)))
#  ID DAY VAL
#1  1   1   1
#2  2   2  NA
#3  2   3   5

edited Jul 28 '15 at 20:22

answered Jul 28 '15 at 19:56

akrun

874,273
37
540
662

2

Maybe I am not reading carefully enough but, why are the two solutions returning different results? – mpalanco Jul 28 '15 at 20:18
2

@mplanco Because there is a small error in the `dplyr` code. It should read: `df %>% group_by(ID, DAY) %>% filter(!(sum(is.na(VAL))!=n() & is.na(VAL)))` – Steven Beaupré Jul 28 '15 at 20:19
1

@StevenBeaupré Thanks, I was trying something different and got a different result. – akrun Jul 28 '15 at 20:21
@mpalanco Thanks for your comment. I didn't even notice it. – akrun Jul 28 '15 at 20:22

Removing specific rows with some NA values in a data frame

1 Answers1