54

I would like to replace NA values with zeros via mutate_if in dplyr. The syntax below:

set.seed(1)
mtcars[sample(1:dim(mtcars)[1], 5),
       sample(1:dim(mtcars)[2], 5)] <-  NA

require(dplyr)

mtcars %>% 
    mutate_if(is.na,0)

mtcars %>% 
    mutate_if(is.na, funs(. = 0))

Returns error:

Error in vapply(tbl, p, logical(1), ...) : values must be length 1, but FUN(X[[1]]) result is length 32

What's the correct syntax for this operation?

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
Konrad
  • 17,740
  • 16
  • 106
  • 167
  • 6
    for this particular task, you might also consider the simpler `tidyr::replace_na` rather than the more generic `mutate_if` approaches – cboettig Jun 06 '18 at 03:16

5 Answers5

54

I learned this trick from the purrr tutorial, and it also works in dplyr. There are two ways to solve this problem:
First, define custom functions outside the pipe, and use it in mutate_if():

any_column_NA <- function(x){
    any(is.na(x))
}
replace_NA_0 <- function(x){
    if_else(is.na(x),0,x)
}
mtcars %>% mutate_if(any_column_NA,replace_NA_0)

Second, use the combination of ~,. or .x.( .x can be replaced with ., but not any other character or symbol):

mtcars %>% mutate_if(~ any(is.na(.x)),~ if_else(is.na(.x),0,.x))
#This also works
mtcars %>% mutate_if(~ any(is.na(.)),~ if_else(is.na(.),0,.))

In your case, you can also use mutate_all():

mtcars %>% mutate_all(~ if_else(is.na(.x),0,.x))

Using ~, we can define an anonymous function, while .x or . stands for the variable. In mutate_if() case, . or .x is each column.

yusuzech
  • 5,896
  • 1
  • 18
  • 33
  • Purrr Tutorial has moved to https://rstudio.com/resources/rstudioconf-2017/happy-r-users-purrr-tutorial-/ – David T May 17 '20 at 23:20
53

The "if" in mutate_if refers to choosing columns, not rows. Eg mutate_if(data, is.numeric, ...) means to carry out a transformation on all numeric columns in your dataset.

If you want to replace all NAs with zeros in numeric columns:

data %>% mutate_if(is.numeric, funs(ifelse(is.na(.), 0, .)))
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • 5
    working fine, one might use `if_else` instead to stay in the `tidyverse` and benefit for the additionnal check of the TRUE, FALSE type coherence – aurelien Oct 13 '17 at 11:40
  • if you want to check if it's NA or equal to "NA" in the ifelse how you caan solve this (add another condition) – Mostafa90 Sep 06 '18 at 07:34
26
mtcars %>% mutate_if(is.numeric, replace_na, 0)

or more recent syntax

mtcars %>% mutate(across(where(is.numeric),
                         replace_na, 0))
Nettle
  • 3,193
  • 2
  • 22
  • 26
  • Simplicity is important. If a simple line of code can do the same thing as more complex, long code, I think it should be chosen instead. – Darius Jan 24 '20 at 23:38
  • This should be in the help page for `mutate_if`. Thanks for making my life easier. – Megatron Oct 14 '21 at 20:35
4

We can use set from data.table

library(data.table)
setDT(mtcars)
for(j in seq_along(mtcars)){
  set(mtcars, i= which(is.na(mtcars[[j]])), j = j, value = 0)
 }
akrun
  • 874,273
  • 37
  • 540
  • 662
  • How might this be modified to only operate on numeric variables please? – Rick Pack Jan 20 '21 at 19:33
  • 1
    @RickPack. You could change the `for(j in seq_along(mtcars))` to `nm1 <- names(mtcars)[mtcars[, unlist(lapply(.SD, is.numeric))]; for(j in nm1)` – akrun Jan 20 '21 at 19:35
3

I always struggle with replace_na function of dplyr

  replace(is.na(.),0)

this works for me for what you are trying to do.

ok1more
  • 779
  • 6
  • 15