8

I am using R to generate examples of how to deal with missing data for the statistics class I am teaching. One method requires generating a "missing values binary variable", with 0 for cases containing missing values, and 1 with no missing values. For example

n  X  Y    Z  
1  4  300  2  
2  8  400  4  
3  10 500  7  
4  18 NA   10  
5  20  50  NA  
6  NA 1000 5  

I would like to generate a variable M, such that

n m  
1 1  
2 1   
3 1  
4 0  
5 0  
6 0  

It seems this should be simple, given R's ability to handle missing values. The closest I have found is m <-ifelse(is.na(missguns),0,1), but all this does is generate a new entire data matrix with 0 or 1 indicating missingness. However, I just want one variable indicating if a row contains missing values.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
jeramy townsley
  • 240
  • 3
  • 18

1 Answers1

9

complete.cases does exactly what you want.

complete.cases(x)
## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

You can coerce to numeric or integer:

as.integer(complete.cases(x))
## [1] 1 1 1 0 0 0
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • Thanks-that did the trick! Just as an update, I was implementing Rubin's t-test. here is the code I generated. The dataset is "missguns" ("guns" dataset but I have included missing values), and one of the variables is "urban". missing<-as.integer(complete.cases(missguns)) practice<-cbind(missguns,missing) missing<-practice[practice$missing==0,] complete<-practice[practice$missing==1,] t.test(missing$urban,complete$urban) – jeramy townsley May 27 '13 at 02:39