Indicator Variable for Missing Values

Question

I need to create a new column in my wbpol dataset named wbpol$missing.

This column will display a 1 if there is a NA in any of the other columns for that row and 0 if there are no NA's in a the other columns of the row.

This is my current code:

wbpol$missing<-ifelse(apply(wbpol, 1, anyNA), TRUE == 1, FALSE == 0)

When I run the code, however, all I get is wbpol$missing to show "TRUE". I need it to say 1 if there is a NA in the other rows and 0 if there is not.

How do I do this?

Maybe `wbpol$missing <- !complete.cases(wbpol)` will do what you want. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Dec 19 '20 at 02:37

score 0 · Accepted Answer · answered Dec 19 '20 at 02:44

For the ifelse statement, the second and third parameter should be what you want the values to be assigned if the statement in the first parameter is true or false, respectively.

In this case, you have set the expression TRUE == 1 to be evaluated in the case that the statement is true, and the expression FALSE == 0 to be evaluated in the case the statement is false. However both TRUE == 1 and FALSE == 0 evaluate to TRUE, which is why your column is filled with TRUE. You can see this if you enter TRUE == 1 or FALSE == 0 in the R console.

Instead, simply indicate that you want the values 1 and 0 to be returned if the statement is true or false, respectively. For example the following will return 1 if the statement is true, and 0 if the statement is false:

wbpol$missing<-ifelse(apply(wbpol, 1, anyNA), 1, 0)

score 0 · Answer 2 · answered Dec 19 '20 at 10:41

The apply might get slow with large data. Better is to look up where rowMeans of of the missings is greater that zero.

dat$miss <- +(rowMeans(is.na(dat)) > 0)
dat
   V1 V2 V3 V4 V5 miss
1   1  1 NA  1  1    1
2   1  1  1  1  1    0
3  NA  1  1  1  1    1
4   1 NA  1  1  1    1
5  NA NA  1 NA  1    1
6   1 NA  1  1  1    1
7   1 NA NA  1 NA    1
8   1  1  1  1  1    0
9   1  1 NA  1 NA    1
10  1  1  1  1  1    0
11  1  1  1 NA  1    1
12  1  1  1  1  1    0
13  1  1  1  1  1    0
14  1 NA  1 NA  1    1
15  1  1  1  1  1    0
16  1 NA  1  1  1    1
17  1 NA  1  1  1    1
18 NA  1  1  1  1    1
19  1  1  1  1  1    0
20 NA  1  1  1 NA    1

Warning: If you are preparing to make a dummy variable adjustment approach to account for missing data, you should know that you will get biased results. Read Allison, Paul D. 2002. Missing Data. SAGE Publications, Inc. . Use multiple imputation or non-parametric imputation instead or consult a local statistician.

Data:

dat <- matrix(1, 20, 5)
set.seed(42)
dat[sample(length(dat), length(dat)*.2)] <- NA
dat <- as.data.frame(dat)

Indicator Variable for Missing Values

2 Answers2