3

I am trying exclude rows of a subset which contain an NA for a particular column that I choose. I have a CSV spreadsheet of survey data this kind of organization, for instance:

name    idnum   term    type      q2    q3
bob     0321    1       2         0     .
.       .       3       1         5     3
ron     .       2       4         2     1
.       2561    4       3         4     2

When I was creating my R-workspace, I set it such that data <- read.csv(..., na.strings='.'). For purposes of my analysis, I then created subsets by term and type, like set13 <- subset(data, term=1 & type=2), for example. When I trying to conduct t-tests, I noticed that the function threw out any instance of NA, effectively cutting my sample size in half.

For my analysis, I want to exclude responses that are missing survey items, such as Bob from my example, missing question 3. But I still want to include rows that have one or more NAs in the name or idnum columns. So, in essence, I want to pick by columns which NAs are omitted. (Keep in mind, this is just an example - my actual CSV has about 1000 rows, so each subset may contain 100-150 rows.)

I know this can be done using data frames, but I'm not sure how to incorporate that into my given subset format. Is there a way to do this?

zh1
  • 231
  • 2
  • 8

3 Answers3

5

Check out complete.cases as shown in the answer to this SO post.

data[complete.cases(data[,3:6]),]

This will return all rows with complete information in columns 3 through 6.

Community
  • 1
  • 1
Rilcon42
  • 9,584
  • 18
  • 83
  • 167
  • `data[complete.cases(data[,c(3,7,8)]),]` this would return complete.cases for columns 3,7,8. is that what you wanted? – Rilcon42 Jun 16 '16 at 01:43
  • Yep, thanks. I think this will solve it, but I can't be sure until I've had some more time to work on the code/stats. – zh1 Jun 16 '16 at 01:58
3

Another approach.

data[rowSums(is.na(data[,3:6]))==0,]
milan
  • 4,782
  • 2
  • 21
  • 39
3

Another option is

data[!Reduce(`|`, lapply(data[3:6], is.na)),]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Why did you propose this particular solution? Is it faster in general or best practice? (Knowing your typical responses I assume it is better for some reason) – Rilcon42 Jun 17 '16 at 02:21
  • @Rilcon42 Your solution should be faster as it use `complete.cases`. This is just another option . – akrun Jun 17 '16 at 02:39