-1

I have a large data set that needs to be cleaned up. When I perform summary statistics, some data are missing so I want to remove various observations that have NA's in specific variable columns that I am interested in.

df<-read.csv("ID, gender, education, IQ, testscore
1, 0, 7, 102, 18
2, NA, 9, NA, 32
3, NA, 8, 78, 33
4, NA, NA, 90, 10
5, 0, 4, 90, 12
6, 0, 4, 99, NA")

Let's say I only want to remove NA's in columns gender, IQ, and testscore because I am only interested in these for my analysis, so it doesn't matter that education has NA's.

My correctly filtered data should look something like: newdf

ID, gender, education, IQ, testscore
1, 0, 7, 102, 18
3, NA, 8, 78, 33
5, 0, 4, 90, 12
crich
  • 99
  • 3
  • 8

1 Answers1

0

This is fairly trivial with the tidyr package

library(tidyr)
newdf <- df %>% drop_na(education, IQ, testscore)
newdf
#  ID gender education  IQ testscore
# 1  1      0         7 102        18
# 3  3     NA         8  78        33
# 5  5      0         4  90        12
Karsten W.
  • 17,826
  • 11
  • 69
  • 103
MrFlick
  • 195,160
  • 17
  • 277
  • 295