Remove columns from dataframe where some of values are NA

Question

I have a dataframe where some of the values are NA. I would like to remove these columns.

My data.frame looks like this

    v1   v2 
1    1   NA 
2    1    1 
3    2    2 
4    1    1 
5    2    2 
6    1   NA

I tried to estimate the col mean and select the column means !=NA. I tried this statement, it does not work.

data=subset(Itun, select=c(is.na(colMeans(Itun))))

I got an error,

error : 'x' must be an array of at least two dimensions

Can anyone give me some help?

Please add an example of what you would like to have as a result. It would also be really helpful to have a fully reproducible example. — BenBarnes, Sep 17 '12 at 07:43

Sven Hohenstein · Accepted Answer · 2012-09-17T07:33:46.767

71

The data:

Itun <- data.frame(v1 = c(1,1,2,1,2,1), v2 = c(NA, 1, 2, 1, 2, NA))

This will remove all columns containing at least one NA:

Itun[ , colSums(is.na(Itun)) == 0]

An alternative way is to use apply:

Itun[ , apply(Itun, 2, function(x) !any(is.na(x)))]

edited Sep 17 '12 at 07:33

answered Sep 17 '12 at 07:25

Sven Hohenstein

80,497
17
145
168

This will remove rows with `NA`s, not columns. – Backlin Sep 17 '12 at 07:30
@Backlin, but to Sven's benefit, the whole question is really poorly worded and it's not clear what exactly the OP wants to do. Drop the columns? Convert something to zero? – A5C1D2H2I1M1N2O1R2T1 Sep 17 '12 at 07:32
True. But he never says anything about rows and uses `subset(..., select=...)` so I figured he wants to extract all rows for certain columns. – Backlin Sep 17 '12 at 07:36
@SvenHohenstein: Sorry for my poorly organized words. I would like to extract columns without NAs from a dataframe. – TTT Sep 18 '12 at 00:48
doesn't this return a logical array, without subsetting the data? – simone Aug 23 '17 at 13:36
should it be `Itun[ , colSums(is.na(Itun)) == 0, with = FALSE]`? – simone Aug 23 '17 at 13:40
@simone `Itun` is a `data.frame`, not a `data.table`. – Sven Hohenstein Aug 23 '17 at 13:42
@SvenHohenstein sorry I thought I had read data.table in the answer. My mistake – simone Aug 23 '17 at 13:55

score 47 · Answer 2 · edited Dec 31 '19 at 07:51

47

Here's a convenient way to do it using the dplyr function select_if(). Combine not (!), any() and is.na(), which is equivalent to selecting all columns that don't contain any NA values.

library(dplyr)
Itun %>%
    select_if(~ !any(is.na(.)))

edited Dec 31 '19 at 07:51

jiggunjer

1,905
1
19
18

answered Oct 27 '17 at 16:48

Matt Dancho

6,840
3
35
26

I was wondering if you can extract the column names of the removed columns simultaneously. Is this possible? – Kots Dec 12 '17 at 17:10
2

I'd split that into two operations. Use `Itun %>% select_if(~ any(is.na(.))) %>% names()`. Then remove columns in second operation using code above. – Matt Dancho Dec 13 '17 at 19:55
great solution. for the cases that collumns should be remove that only have NAs you can use `select_if(~ !all(is.na(.))` – JdP Feb 23 '18 at 09:29
This solution is very nice but very slow. Itun[ , colSums(is.na(Itun)) == 0] by @Sven-hohenstein is much faster. – Matthias Munz Aug 13 '20 at 08:27
What does it return though if I wanted to have the columns that have NA/NULL ? When I ran the opposite (i.e. without `!`), it returned a bunch of columns that didn't have NAs; the column that had NA was returned along with them, though. – stucash Nov 15 '21 at 16:59

score 17 · Answer 3 · answered Sep 04 '20 at 17:28

Alternatively, select(where(~FUNCTION)) can be used:

library(dplyr)

(df <- data.frame(x = letters[1:5], y = NA, z = c(1:4, NA)))
#>   x  y  z
#> 1 a NA  1
#> 2 b NA  2
#> 3 c NA  3
#> 4 d NA  4
#> 5 e NA NA

# Remove columns where all values are NA
df %>% 
  select(where(~!all(is.na(.))))
#>   x  z
#> 1 a  1
#> 2 b  2
#> 3 c  3
#> 4 d  4
#> 5 e NA
  
# Remove columns with at least one NA  
df %>% 
  select(where(~!any(is.na(.))))
#>   x
#> 1 a
#> 2 b
#> 3 c
#> 4 d
#> 5 e

score 13 · Answer 4 · answered Apr 01 '16 at 19:13

13

You can use transpose twice:

newdf <- t(na.omit(t(df)))

answered Apr 01 '16 at 19:13

Scott Worland

1,352
1
12
15

Backlin · Answer 5 · 2012-09-17T08:05:53.380

6

data[,!apply(is.na(data), 2, any)]

edited Sep 17 '12 at 08:05

answered Sep 17 '12 at 07:27

Backlin

14,612
2
49
81

Shouldn't the `data.frame` version be the same as the `matrix` version, just without the first comma? I get an error (`undefined columns selected`) with your code as it is. – A5C1D2H2I1M1N2O1R2T1 Sep 17 '12 at 07:44
1

However, `apply` converts the input to a matrix prior to applying the function, so I prefer to use `sapply` or `lapply` on data frames. Then again so does `is.na` so in this case the input is already a matrix and my first example was actually incorrect! Perhaps the conceptually nices solution is `sapply(data, function(x) !any(is.na(x)))`, but this is really nitpicking. – Backlin Sep 17 '12 at 08:05

score 2 · Answer 6 · answered Feb 03 '17 at 19:30

A base R method related to the apply answers is

Itun[!unlist(vapply(Itun, anyNA, logical(1)))]
  v1
1  1
2  1
3  2
4  1
5  2
6  1

Here, vapply is used as we are operating on a list, and, apply, it does not coerce the object into a matrix. Also, since we know that the output will be logical vector of length 1, we can feed this to vapply and potentially get a little speed boost. For the same reason, I used anyNA instead of any(is.na()).

Oriol Prat · Answer 7 · 2019-07-15T15:50:23.843

2

Another alternative with the dplyr package would be to make use of the Filter function

Filter(function(x) !any(is.na(x)), Itun)

with data.table would be a little more cumbersome

setDT(Itun)[,.SD,.SDcols=setdiff((1:ncol(Itun)),
                                which(colSums(is.na(Itun))>0))]

edited Jul 15 '19 at 15:50

answered Jul 15 '19 at 15:44

Oriol Prat

1,017
1
11
19

score 0 · Answer 8 · answered Aug 17 '22 at 04:10

0

You can also try:

df <- df[,colSums(is.na(df))<nrow(df)]

answered Aug 17 '22 at 04:10

Remove columns from dataframe where some of values are NA

8 Answers8

Linked

Related