35

After merging a dataframe with another im left with random NA's for the occasional row. I'd like to set these NA's to 0 so I can perform calculations with them.

Im trying to do this with:

    bothbeams.data = within(bothbeams.data, {
      bothbeams.data$x.x = ifelse(is.na(bothbeams.data$x.x) == TRUE, 0, bothbeams.data$x.x)
      bothbeams.data$x.y = ifelse(is.na(bothbeams.data$x.y) == TRUE, 0, bothbeams.data$x.y)
    })

Where $x.x is one column and $x.y is the other of course, but this doesn't seem to work.

MaikelS
  • 1,309
  • 4
  • 16
  • 33

6 Answers6

74

You can just use the output of is.na to replace directly with subsetting:

bothbeams.data[is.na(bothbeams.data)] <- 0

Or with a reproducible example:

dfr <- data.frame(x=c(1:3,NA),y=c(NA,4:6))
dfr[is.na(dfr)] <- 0
dfr
  x y
1 1 0
2 2 4
3 3 5
4 0 6

However, be careful using this method on a data frame containing factors that also have missing values:

> d <- data.frame(x = c(NA,2,3),y = c("a",NA,"c"))
> d[is.na(d)] <- 0
Warning message:
In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
  invalid factor level, NA generated

It "works":

> d
  x    y
1 0    a
2 2 <NA>
3 3    c

...but you likely will want to specifically alter only the numeric columns in this case, rather than the whole data frame. See, eg, the answer below using dplyr::mutate_if.

joran
  • 169,992
  • 32
  • 429
  • 468
James
  • 65,548
  • 14
  • 155
  • 193
23

A solution using mutate_all from dplyr in case you want to add that to your dplyr pipeline:

library(dplyr)
df %>%
  mutate_all(funs(ifelse(is.na(.), 0, .)))

Result:

   A B C
1  0 0 0
2  1 0 0
3  2 0 2
4  3 0 5
5  0 0 2
6  0 0 1
7  1 0 1
8  2 0 5
9  3 0 2
10 0 0 4
11 0 0 3
12 1 0 5
13 2 0 5
14 3 0 0
15 0 0 1

If in any case you only want to replace the NA's in numeric columns, which I assume it might be the case in modeling, you can use mutate_if:

library(dplyr)
df %>%
  mutate_if(is.numeric, funs(ifelse(is.na(.), 0, .)))

or in base R:

replace(is.na(df), 0)

Result:

   A    B C
1  0    0 0
2  1 <NA> 0
3  2    0 2
4  3 <NA> 5
5  0    0 2
6  0 <NA> 1
7  1    0 1
8  2 <NA> 5
9  3    0 2
10 0 <NA> 4
11 0    0 3
12 1 <NA> 5
13 2    0 5
14 3 <NA> 0
15 0    0 1

Update

with dplyr 1.0.0, across is introduced:

library(dplyr)
# Replace `NA` for all columns
df %>%
  mutate(across(everything(), ~ ifelse(is.na(.), 0, .)))

# Replace `NA` for numeric columns
df %>%
  mutate(across(where(is.numeric), ~ ifelse(is.na(.), 0, .)))

Data:

set.seed(123)
df <- data.frame(A=rep(c(0:3, NA), 3), 
                 B=rep(c("0", NA), length.out = 15), 
                 C=sample(c(0:5, NA), 15, replace = TRUE))
acylam
  • 18,231
  • 5
  • 36
  • 45
  • 1
    Note funs() was deprecated in dplyr 0.8.0 but mutate() with across() and and ifelse() is a good substitute. – SCallan Sep 28 '22 at 15:54
4

You can use replace_na() from tidyr package

df %>% replace_na(list(column1 = 0, column2 = 0)

yuliaUU
  • 1,581
  • 2
  • 12
  • 33
Saurav Das
  • 104
  • 2
1

To add to James's example, it seems you always have to create an intermediate when performing calculations on NA-containing data frames.

For instance, adding two columns (A and B) together from a data frame dfr:

temp.df <- data.frame(dfr) # copy the original
temp.df[is.na(temp.df)] <- 0
dfr$C <- temp.df$A + temp.df$B # or any other calculation
remove('temp.df')

When I do this I throw away the intermediate afterwards with remove/rm.

Louis Maddox
  • 5,226
  • 5
  • 36
  • 66
0

If you only want to replace NAs with 0s for a few select columns you also use an lapply solution, e.g:

data = data.frame(
 one = c(NA,0),
 two = c(NA,NA),
 three = c(1,2),
 four = c("A",NA)
)

data[1:2] = lapply(data[1:2],function(x){
  x[is.na(x)] = 0
  return(x)
})

data
-1

Why not try this

  na.zero <- function (x) {
        x[is.na(x)] <- 0
        return(x)
    }
    na.zero(df)
Deepesh
  • 820
  • 1
  • 14
  • 32