Omit rows containing specific column of NA

Question

I want to know how to omit NA values in a data frame, but only in some columns I am interested in.

For example,

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

but I only want to omit the data where y is NA, therefore the result should be

  x  y  z
1 1  0 NA
2 2 10 33

na.omit seems delete all rows contain any NA.

Can somebody help me out of this simple question?

But if now I change the question like:

DF <- data.frame(x = c(1, 2, 3,NA), y = c(1,0, 10, NA), z=c(43,NA, 33, NA))

If I want to omit only x=na or z=na, where can I put the | in function?

score 241 · Answer 1 · answered Jun 29 '12 at 00:06

241

Use is.na

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
DF[!is.na(DF$y),]

answered Jun 29 '12 at 00:06

mnel

113,303
27
265
254

3

How do you apply this approach greedily on all columns in the data set? If any of the column value is NA skip. So your data set output is the second column only. – Léo Léopold Hertz 준영 Jul 18 '17 at 15:35
4

Use `na.omit` to greedily remove all rows with NA in any column `na.omit(DF)` – M.Viking Aug 21 '19 at 18:50

score 105 · Answer 2 · answered Aug 16 '16 at 18:37

105

Hadley's tidyr just got this amazing function drop_na

library(tidyr)
DF %>% drop_na(y)
  x  y  z
1 1  0 NA
2 2 10 33

answered Aug 16 '16 at 18:37

amrrs

6,215
2
18
27

8

This method also allows you to specify more than one column (for dropping NA values). For instance, one could use DF %>% drop_na(y,z) to remove NA values in both columns, y, and z. – SolingerStuebchen Sep 23 '20 at 09:59
@SolingerStuebchen can you pass a list for the columns to drop? – queste Mar 24 '23 at 03:32
1

@queste yes, that is possible. You can do the following to drop NA values in multiple columns. First, define a list of column to be checked: drop_list <- c("y","z"). Second, you call DF%>% drop_na(drop_list). – CausalQuestions Apr 02 '23 at 22:53

BenBarnes · Accepted Answer · 2017-07-18T15:49:03.800

95

You could use the complete.cases function and put it into a function thusly:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

completeFun <- function(data, desiredCols) {
  completeVec <- complete.cases(data[, desiredCols])
  return(data[completeVec, ])
}

completeFun(DF, "y")
#   x  y  z
# 1 1  0 NA
# 2 2 10 33

completeFun(DF, c("y", "z"))
#   x  y  z
# 2 2 10 33

EDIT: Only return rows with no NAs

If you want to eliminate all rows with at least one NA in any column, just use the complete.cases function straight up:

DF[complete.cases(DF), ]
#   x  y  z
# 2 2 10 33

Or if completeFun is already ingrained in your workflow ;)

completeFun(DF, names(DF))

edited Jul 18 '17 at 15:49

answered Jun 29 '12 at 08:08

BenBarnes

19,114
6
56
74

Can you make your approach greedy? Take all columns that do not have NAs at all. – Léo Léopold Hertz 준영 Jul 18 '17 at 15:33
1

You mean just return *rows* with no `NA`s? Like `completeFun(DF, names(DF))`? – BenBarnes Jul 18 '17 at 15:39
Correct! Please, consider adding it to your answer because it is a common need here. - - I think mnel's answer cannot be expanded as yours. Your function approach is great! – Léo Léopold Hertz 준영 Jul 18 '17 at 15:43
1

Done! Thx for the tip @LéoLéopoldHertz준영 – BenBarnes Jul 18 '17 at 15:50
If you are viewing this past 2020 do yourself a favor and look at the more recent answers given below, for example the approach outlined by @amrrs below using `drop_na()` from `tidyr` does the same thing but is in my opinion a better solution today. – Ricky Oct 18 '20 at 15:38

score 36 · Answer 4 · answered Jun 12 '13 at 19:00

36

Use 'subset'

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
subset(DF, !is.na(y))

answered Jun 12 '13 at 19:00

Rnoob

1,013
1
11
12

score 17 · Answer 5 · edited Jun 26 '20 at 16:57

17

It is possible to use na.omit for data.table:

na.omit(data, cols = c("x", "z"))

edited Jun 26 '20 at 16:57

M--

25,431
8
61
93

answered Feb 28 '19 at 11:48

Droney

179
1
4

7

the `cols=` argument is available in the `data.table::na.omit` library. Not the base `stats::na.omit`. – M.Viking Aug 21 '19 at 18:39

score 7 · Answer 6 · answered Aug 21 '19 at 18:44

7

Omit row if either of two specific columns contain <NA>.

DF[!is.na(DF$x)&!is.na(DF$z),]

answered Aug 21 '19 at 18:44

M.Viking

5,067
4
17
33

score 3 · Answer 7 · answered Jun 29 '12 at 01:33

3

Try this:

cc=is.na(DF$y)
m=which(cc==c("TRUE"))
DF=DF[-m,]

answered Jun 29 '12 at 01:33

rockswap

623
1
7
17

score 2 · Answer 8 · edited Jul 24 '20 at 23:19

2

Just try this:

DF %>% t %>% na.omit %>% t

It transposes the data frame and omits null rows which were 'columns' before transposition and then you transpose it back.

edited Jul 24 '20 at 23:19

M--

25,431
8
61
93

answered Aug 22 '19 at 19:59

lqi

121
2
4

12

Please explain a bit what is going on. – vonbrand Aug 22 '19 at 20:17

score 2 · Answer 9 · answered Sep 20 '21 at 02:56

2

To update, a tidyverse approach with dplyr:

library(dplyr)

your_data_frame %>% 
  filter(!is.na(region_column))

answered Sep 20 '21 at 02:56

Vinícius Félix

8,448
6
16
32

score 0 · Answer 10 · answered Aug 27 '22 at 13:37

You don't need to create a custom function with complete.cases to remove the rows with NA in a certain column. Here is a reproducible example:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
DF
#>   x  y  z
#> 1 1  0 NA
#> 2 2 10 33
#> 3 3 NA 22
DF[complete.cases(DF$y),]
#>   x  y  z
#> 1 1  0 NA
#> 2 2 10 33

^{Created on 2022-08-27 with reprex v2.0.2}

As you can see, it removed the row with NA in certain column.

Omit rows containing specific column of NA

10 Answers10

Linked

Related