How to remove observation row where different NA's appear

Question

I have a large data set that needs to be cleaned up. When I perform summary statistics, some data are missing so I want to remove various observations that have NA's in specific variable columns that I am interested in.

df<-read.csv("ID, gender, education, IQ, testscore
1, 0, 7, 102, 18
2, NA, 9, NA, 32
3, NA, 8, 78, 33
4, NA, NA, 90, 10
5, 0, 4, 90, 12
6, 0, 4, 99, NA")

Let's say I only want to remove NA's in columns gender, IQ, and testscore because I am only interested in these for my analysis, so it doesn't matter that education has NA's.

My correctly filtered data should look something like: newdf

ID, gender, education, IQ, testscore
1, 0, 7, 102, 18
3, NA, 8, 78, 33
5, 0, 4, 90, 12

`new<-df[c(!is.na(df$gender) & !is.na(df$IQ) & !is.na(df$testscore)),]` — M.Bergen, May 22 '19 at 15:59

score 0 · Accepted Answer · edited May 22 '19 at 18:01

0

This is fairly trivial with the tidyr package

library(tidyr)
newdf <- df %>% drop_na(education, IQ, testscore)
newdf
#  ID gender education  IQ testscore
# 1  1      0         7 102        18
# 3  3     NA         8  78        33
# 5  5      0         4  90        12

edited May 22 '19 at 18:01

Karsten W.

17,826
11
69
103

answered May 22 '19 at 16:04

MrFlick

195,160
17
277
295

How to remove observation row where different NA's appear

1 Answers1