Exclude rows that contain NA in a particular column in subsets

Question

I am trying exclude rows of a subset which contain an NA for a particular column that I choose. I have a CSV spreadsheet of survey data this kind of organization, for instance:

name    idnum   term    type      q2    q3
bob     0321    1       2         0     .
.       .       3       1         5     3
ron     .       2       4         2     1
.       2561    4       3         4     2

When I was creating my R-workspace, I set it such that data <- read.csv(..., na.strings='.'). For purposes of my analysis, I then created subsets by term and type, like set13 <- subset(data, term=1 & type=2), for example. When I trying to conduct t-tests, I noticed that the function threw out any instance of NA, effectively cutting my sample size in half.

For my analysis, I want to exclude responses that are missing survey items, such as Bob from my example, missing question 3. But I still want to include rows that have one or more NAs in the name or idnum columns. So, in essence, I want to pick by columns which NAs are omitted. (Keep in mind, this is just an example - my actual CSV has about 1000 rows, so each subset may contain 100-150 rows.)

I know this can be done using data frames, but I'm not sure how to incorporate that into my given subset format. Is there a way to do this?

score 5 · Accepted Answer · edited May 23 '17 at 10:29

5

Check out complete.cases as shown in the answer to this SO post.

data[complete.cases(data[,3:6]),]

This will return all rows with complete information in columns 3 through 6.

edited May 23 '17 at 10:29

Community

1
1

answered Jun 16 '16 at 01:37

Rilcon42

9,584
18
83
167

`data[complete.cases(data[,c(3,7,8)]),]` this would return complete.cases for columns 3,7,8. is that what you wanted? – Rilcon42 Jun 16 '16 at 01:43
Yep, thanks. I think this will solve it, but I can't be sure until I've had some more time to work on the code/stats. – zh1 Jun 16 '16 at 01:58

milan · Answer 2 · 2016-06-16T02:16:45.140

3

Another approach.

data[rowSums(is.na(data[,3:6]))==0,]

edited Jun 16 '16 at 02:16

answered Jun 16 '16 at 02:07

milan

4,782
2
21
39

score 3 · Answer 3 · answered Jun 16 '16 at 02:55

3

Another option is

data[!Reduce(`|`, lapply(data[3:6], is.na)),]

answered Jun 16 '16 at 02:55

akrun

874,273
37
540
662

Why did you propose this particular solution? Is it faster in general or best practice? (Knowing your typical responses I assume it is better for some reason) – Rilcon42 Jun 17 '16 at 02:21
@Rilcon42 Your solution should be faster as it use `complete.cases`. This is just another option . – akrun Jun 17 '16 at 02:39

Exclude rows that contain NA in a particular column in subsets

3 Answers3