I am trying exclude rows of a subset which contain an NA for a particular column that I choose. I have a CSV spreadsheet of survey data this kind of organization, for instance:
name idnum term type q2 q3
bob 0321 1 2 0 .
. . 3 1 5 3
ron . 2 4 2 1
. 2561 4 3 4 2
When I was creating my R-workspace, I set it such that data <- read.csv(..., na.strings='.')
. For purposes of my analysis, I then created subsets by term and type, like set13 <- subset(data, term=1 & type=2)
, for example. When I trying to conduct t-tests, I noticed that the function threw out any instance of NA, effectively cutting my sample size in half.
For my analysis, I want to exclude responses that are missing survey items, such as Bob from my example, missing question 3. But I still want to include rows that have one or more NAs in the name
or idnum
columns. So, in essence, I want to pick by columns which NAs are omitted. (Keep in mind, this is just an example - my actual CSV has about 1000 rows, so each subset may contain 100-150 rows.)
I know this can be done using data frames, but I'm not sure how to incorporate that into my given subset format. Is there a way to do this?