Find columns with all missing values

Question

I am writing a function, which needs a check on whether (and which!) column (variable) has all missing values (NA, <NA>). The following is fragment of the function:

test1 <- data.frame (matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3))
test2 <- data.frame (matrix(c(1,2,3,NA,NA,NA,NA,NA,2), 3,3))

na.test <-  function (data) {
  if (colSums(!is.na(data) == 0)){
      stop ("The some variable in the dataset has all missing value,
     remove the column to proceed")
      }
      }
na.test (test1)

Warning message:
In if (colSums(!is.na(data) == 0)) { :
  the condition has length > 1 and only the first element will be used

Q1: Why is the above error and any fixes ?

Q2: Is there any way to find which of columns have all NA, for example output the list (name of variable or column number)?

I feel like this must have been answered before: something like `names(data)[lapply(data,function(x) all(is.na(x)))]` — Ben Bolker, Jul 04 '12 at 13:35
Your parentheses are wrong. Write `colSums(!is.na(data)) == 0` instead of `colSums(!is.na(data) == 0)`. — sgibb, Jul 04 '12 at 13:40

Andrie · Accepted Answer · 2012-07-04T13:53:04.077

44

This is easy enough to with sapply and a small anonymous function:

sapply(test1, function(x)all(is.na(x)))
   X1    X2    X3 
FALSE FALSE FALSE 

sapply(test2, function(x)all(is.na(x)))
   X1    X2    X3 
FALSE  TRUE FALSE

And inside a function:

na.test <-  function (x) {
  w <- sapply(x, function(x)all(is.na(x)))
  if (any(w)) {
    stop(paste("All NA in columns", paste(which(w), collapse=", ")))
  }
}

na.test(test1)

na.test(test2)
Error in na.test(test2) : All NA in columns 2

edited Jul 04 '12 at 13:53

answered Jul 04 '12 at 13:40

Andrie

176,377
47
447
496

thanks, I intention here to put as part of function, so that the function stop if there are any TRUE...which is important for me as I have big dataset with > 50000 variables – SHRram Jul 04 '12 at 13:46
You may also be able to reduce the number of computations by noting that a columns comprised of only `NA` will be of `class` `logical`. Further, if you don't expect any other logical columns then this gives you the NA columns. – James Jul 04 '12 at 14:04
2

@James: that might be fragile. There are 'classed' versions of `NA`, such as `NA_integer_`, which show up as `NA` but are not logical. It might depend on where the `NA`s came from ... – Ben Bolker Jul 04 '12 at 14:37
Is it me, or will this solution only return "TRUE" when ALL elements of the column are NA, rather than the initial request (that it return true if ANY element was NA)? I just tried it on a data.frame that has two known rows with NAs in them, and got a return set telling me that there were no NAs in any column. – GT. Nov 16 '13 at 01:53
@GT. Yes, that was the question as posed. If you want to return TRUE if one or more values are missing, use `any()` instead of `all()`. – Andrie Nov 16 '13 at 08:30

Tony Ladson · Answer 2 · 2017-03-08T11:12:19.687

In dplyr

ColNums_NotAllMissing <- function(df){ # helper function
  as.vector(which(colSums(is.na(df)) != nrow(df)))
}

df %>%
select(ColNums_NotAllMissing(.))

example:
x <- data.frame(x = c(NA, NA, NA), y = c(1, 2, NA), z = c(5, 6, 7))

x %>%
select(ColNums_NotAllMissing(.))

or, the other way around

Cols_AllMissing <- function(df){ # helper function
  as.vector(which(colSums(is.na(df)) == nrow(df)))
}


x %>%
  select(-Cols_AllMissing(.))

score 7 · Answer 3 · edited Feb 25 '15 at 09:11

7

To find the columns with all values missing

 allmisscols <- apply(dataset,2, function(x)all(is.na(x)));  
 colswithallmiss <-names(allmisscols[allmisscols>0]);    
 print("the columns with all values missing");    
 print(colswithallmiss);

edited Feb 25 '15 at 09:11

zx8754

52,746
12
114
209

answered Feb 25 '15 at 08:17

Choukha Ram

81
1
4

AlexB · Answer 4 · 2019-09-09T15:14:52.607

6

This one will generate the column names that are full of NAs:

library(purrr)
df %>% keep(~all(is.na(.x))) %>% names

edited Sep 09 '19 at 15:14

answered Sep 09 '19 at 14:29

AlexB

3,061
2
17
19

PHH13 · Answer 5 · 2016-06-07T02:06:13.680

4

To test whether columns have all missing values:

apply(test1,2,function(x) {all(is.na(x))})

To get which columns have all missing values:

  test1.nona <- test1[ , colSums(is.na(test1)) == 0]

edited Jun 07 '16 at 02:06

answered Jun 07 '16 at 01:52

PHH13

53
8

score 3 · Answer 6 · answered Jun 13 '19 at 07:59

3

dplyr approach to finding the number of NAs for each column:

df %>% 
  summarise_all((funs(sum(is.na(.)))))

answered Jun 13 '19 at 07:59

Moh

188
8

1

the updated method is now `df %>% summarise_all((~ sum(is.na(.))))` – Emman Aug 26 '20 at 08:38

score 1 · Answer 7 · answered Feb 20 '21 at 15:10

1

Variant dplyr approach:

dataframe %>% select_if(function(x) all(is.na(x))) %>% colnames()

answered Feb 20 '21 at 15:10

Arthur Small

621
1
6
15

Hat tip: https://sebastiansauer.github.io/NAs-with-dplyr/ – Arthur Small Feb 20 '21 at 15:12

score 0 · Answer 8 · answered Dec 17 '17 at 12:43

0

The following command gives you a nice table with the columns that have NA values:

sapply(dataframe, function(x)all(any(is.na(x))))

It's an improvement for the first answer you got, which doesn't work properly from some cases.

answered Dec 17 '17 at 12:43

cs4r

584
6
15

score 0 · Answer 9 · edited Jun 28 '20 at 07:34

0

sapply(b,function(X) sum(is.na(X))

This will give you the count of na in each column of the dataset and also will give 0 if there is no na present in the column

edited Jun 28 '20 at 07:34

David Buck

3,752
35
31
35

answered Jun 28 '20 at 07:01

Abhi

101
1
3

```sapply(b,function(X) sum(is.na(X)))``` – William Jun 01 '22 at 23:45

Find columns with all missing values

9 Answers9

Linked

Related