Conditionally remove of rows in dataframe which includes NA

Question

My example df:

  a1 a2 a3 a4
1  1  1  4  6
2  1  2  3  2
3  2 NA  5 NA
4  2  5  6  3
5  3  1  1  2
6  3  3  2  6

"If a4 == 6 then delete this row." So, I would like to delete (only!) row 1 and 6 in this example.

I know this works:

df_1 <- df[-c(1, 6), ]

But I'm looking for a more general solution.

I tried the most obvious way:

attach(df)
df_1 <- df[ which(a4 != 6),]
detach(df)

However, this deletes all NA as well and I would like to keep them.

  a1 a2 a3 a4
2  1  2  3  2
4  2  5  6  3
5  3  1  1  2

Then I tried:

df_1 <-df[!(df$a4 == 6),]

but then row 3 dances limbo and the whole row gets NA

   a1 a2 a3 a4
2   1  2  3  2
NA NA NA NA NA
4   2  5  6  3
5   3  1  1  2

Any ideas? Thank you in advance!

Possible duplicate of [How do I replace NA values with zeros in an R dataframe?](http://stackoverflow.com/questions/8161836/how-do-i-replace-na-values-with-zeros-in-an-r-dataframe) — amonk, May 18 '17 at 12:29
@agerom the link is not a dupe, OP is not trying to replace NAs by anything — Cath, May 18 '17 at 12:43

akrun · Answer 1 · 2017-05-18T06:00:34.847

2

We can use a logical index with is.na to remove

df[!(df$a4 == 6 & !is.na(df$a4)),]

as it will return the whole dataset when the element is not present

Or it can be written (as @thelatemail commented)

df[df$a4!=6 | (is.na(df$a4)),]

edited May 18 '17 at 06:00

answered May 18 '17 at 05:59

akrun

4

Why the double `!`? `df[df$a4!=6 | is.na(df$a4),]` – thelatemail May 18 '17 at 06:00

score 2 · Accepted Answer · answered May 18 '17 at 12:34

2

You can use %in% instead of == to properly handle NAs:

df[!(df$a4 %in% 6),]
#  a1 a2 a3 a4
#2  1  2  3  2
#3  2 NA  5 NA
#4  2  5  6  3
#5  3  1  1  2

answered May 18 '17 at 12:34

Cath

Wonderful! I didn't know something like "%in%" exists. TIL :) Thanks! – KDBoom May 22 '17 at 00:49

2 Answers2