169

I want to know how to omit NA values in a data frame, but only in some columns I am interested in.

For example,

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

but I only want to omit the data where y is NA, therefore the result should be

  x  y  z
1 1  0 NA
2 2 10 33

na.omit seems delete all rows contain any NA.

Can somebody help me out of this simple question?

But if now I change the question like:

DF <- data.frame(x = c(1, 2, 3,NA), y = c(1,0, 10, NA), z=c(43,NA, 33, NA))

If I want to omit only x=na or z=na, where can I put the | in function?

John Paul
  • 12,196
  • 6
  • 55
  • 75
user1489975
  • 1,841
  • 2
  • 14
  • 8

10 Answers10

241

Use is.na

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
DF[!is.na(DF$y),]
mnel
  • 113,303
  • 27
  • 265
  • 254
105

Hadley's tidyr just got this amazing function drop_na

library(tidyr)
DF %>% drop_na(y)
  x  y  z
1 1  0 NA
2 2 10 33
amrrs
  • 6,215
  • 2
  • 18
  • 27
  • 8
    This method also allows you to specify more than one column (for dropping NA values). For instance, one could use DF %>% drop_na(y,z) to remove NA values in both columns, y, and z. – SolingerStuebchen Sep 23 '20 at 09:59
  • @SolingerStuebchen can you pass a list for the columns to drop? – queste Mar 24 '23 at 03:32
  • 1
    @queste yes, that is possible. You can do the following to drop NA values in multiple columns. First, define a list of column to be checked: drop_list <- c("y","z"). Second, you call DF%>% drop_na(drop_list). – CausalQuestions Apr 02 '23 at 22:53
95

You could use the complete.cases function and put it into a function thusly:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

completeFun <- function(data, desiredCols) {
  completeVec <- complete.cases(data[, desiredCols])
  return(data[completeVec, ])
}

completeFun(DF, "y")
#   x  y  z
# 1 1  0 NA
# 2 2 10 33

completeFun(DF, c("y", "z"))
#   x  y  z
# 2 2 10 33

EDIT: Only return rows with no NAs

If you want to eliminate all rows with at least one NA in any column, just use the complete.cases function straight up:

DF[complete.cases(DF), ]
#   x  y  z
# 2 2 10 33

Or if completeFun is already ingrained in your workflow ;)

completeFun(DF, names(DF))
BenBarnes
  • 19,114
  • 6
  • 56
  • 74
36

Use 'subset'

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
subset(DF, !is.na(y))
Rnoob
  • 1,013
  • 1
  • 11
  • 12
17

It is possible to use na.omit for data.table:

na.omit(data, cols = c("x", "z"))
M--
  • 25,431
  • 8
  • 61
  • 93
Droney
  • 179
  • 1
  • 4
  • 7
    the `cols=` argument is available in the `data.table::na.omit` library. Not the base `stats::na.omit`. – M.Viking Aug 21 '19 at 18:39
7

Omit row if either of two specific columns contain <NA>.

DF[!is.na(DF$x)&!is.na(DF$z),]
M.Viking
  • 5,067
  • 4
  • 17
  • 33
3

Try this:

cc=is.na(DF$y)
m=which(cc==c("TRUE"))
DF=DF[-m,]
rockswap
  • 623
  • 1
  • 7
  • 17
2

Just try this:

DF %>% t %>% na.omit %>% t

It transposes the data frame and omits null rows which were 'columns' before transposition and then you transpose it back.

M--
  • 25,431
  • 8
  • 61
  • 93
lqi
  • 121
  • 2
  • 4
2

To update, a tidyverse approach with dplyr:

library(dplyr)

your_data_frame %>% 
  filter(!is.na(region_column))
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32
0

You don't need to create a custom function with complete.cases to remove the rows with NA in a certain column. Here is a reproducible example:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
DF
#>   x  y  z
#> 1 1  0 NA
#> 2 2 10 33
#> 3 3 NA 22
DF[complete.cases(DF$y),]
#>   x  y  z
#> 1 1  0 NA
#> 2 2 10 33

Created on 2022-08-27 with reprex v2.0.2

As you can see, it removed the row with NA in certain column.

Quinten
  • 35,235
  • 5
  • 20
  • 53