101

There are a lot of posts about replacing NA values. I am aware that one could replace NAs in the following table/frame with the following:

x[is.na(x)]<-0

But, what if I want to restrict it to only certain columns? Let's me show you an example.

First, let's start with a dataset.

set.seed(1234)
x <- data.frame(a=sample(c(1,2,NA), 10, replace=T),
                b=sample(c(1,2,NA), 10, replace=T), 
                c=sample(c(1:5,NA), 10, replace=T))

Which gives:

    a  b  c
1   1 NA  2
2   2  2  2
3   2  1  1
4   2 NA  1
5  NA  1  2
6   2 NA  5
7   1  1  4
8   1  1 NA
9   2  1  5
10  2  1  1

Ok, so I only want to restrict the replacement to columns 'a' and 'b'. My attempt was:

x[is.na(x), 1:2]<-0

and:

x[is.na(x[1:2])]<-0

Which does not work.

My data.table attempt, where y<-data.table(x), was obviously never going to work:

y[is.na(y[,list(a,b)]), ]

I want to pass columns inside the is.na argument but that obviously wouldn't work.

I would like to do this in a data.frame and a data.table. My end goal is to recode the 1:2 to 0:1 in 'a' and 'b' while keeping 'c' the way it is, since it is not a logical variable. I have a bunch of columns so I don't want to do it one by one. And, I'd just like to know how to do this.

Do you have any suggestions?

M--
  • 25,431
  • 8
  • 61
  • 93
jnam27
  • 1,367
  • 2
  • 12
  • 16

12 Answers12

139

You can do:

x[, 1:2][is.na(x[, 1:2])] <- 0

or better (IMHO), use the variable names:

x[c("a", "b")][is.na(x[c("a", "b")])] <- 0

In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector.

flodel
  • 87,577
  • 21
  • 185
  • 223
  • That does the job. What about if I want to search for '1'? I tried to change it around but I couldn't get it to work. – jnam27 Oct 15 '13 at 11:07
  • 6
    Probably like this: `x[, 1:2][x[, 1:2] == 1] <- 0` – flodel Oct 15 '13 at 11:08
  • @flodel why does the datatable `x` accept a matrix as its first member only when doing assignation ? Is this feature documented somewhere ? Also I think you forgot to put a comma before the vectors with column names in you second example. – ChiseledAbs Dec 09 '16 at 02:35
  • @ChiseledAbs, I think you are referring to matrix indexing (see this for example http://stackoverflow.com/a/13999583/1201032), but it is not limited to assignments, it can also be used to extract data. Regarding the missing comma: no. Data.frames are lists of columns so if you use a single argument to `[`, it will extract the specified columns (see http://stackoverflow.com/a/21137524/1201032). I hope this answers your question but in the future, please avoid commenting on very old answers like this one; instead post a new question. – flodel Dec 09 '16 at 23:38
  • 1
    `In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector.` When I used a predefined vector like this `x[Vpredefined][is.na(x[Vpredefined])] <- 0` it gives me error – Rohit Saluja Feb 28 '18 at 12:37
  • @RohitSaluja, maybe your `x` is a matrix and not a data.frame like was the case in this question? – flodel Mar 03 '18 at 13:07
  • My x is a data.table – Rohit Saluja Mar 05 '18 at 07:33
  • 1
    @RohitSaluja is correct, using a predefined vector, this approach doesn't work. One can do x[,..Vpredefined], but the second call does not work. – hrrrrrr5602 Jan 15 '21 at 17:37
  • May i ask how can we do that in pipe `%>%`? – ah bon Oct 19 '21 at 01:55
49

Building on @Robert McDonald's tidyr::replace_na() answer, here are some dplyr options for controlling which columns the NAs are replaced:

library(tidyverse)

# by column type:
x %>%
  mutate_if(is.numeric, ~replace_na(., 0))

# select columns defined in vars(col1, col2, ...):
x %>%
  mutate_at(vars(a, b, c), ~replace_na(., 0))

# all columns:
x %>%
  mutate_all(~replace_na(., 0))
sbha
  • 9,802
  • 2
  • 74
  • 62
42

Edit 2020-06-15

Since data.table 1.12.4 (Oct 2019), data.table gains two functions to facilitate this: nafill and setnafill.

nafill operates on columns:

cols = c('a', 'b')
y[ , (cols) := lapply(.SD, nafill, fill=0), .SDcols = cols]

setnafill operates on tables (the replacements happen by-reference/in-place)

setnafill(y, cols=cols, fill=0)
# print y to show the effect
y[]

This will also be more efficient than the other options; see ?nafill for more, the last-observation-carried-forward (LOCF) and next-observation-carried-backward (NOCB) versions of NA imputation for time series.


This will work for your data.table version:

for (col in c("a", "b")) y[is.na(get(col)), (col) := 0]

Alternatively, as David Arenburg points out below, you can use set (side benefit - you can use it either on data.frame or data.table):

for (col in 1:2) set(x, which(is.na(x[[col]])), col, 0)
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
eddi
  • 49,088
  • 6
  • 104
  • 155
  • thanks for this. Just wanted to know, 3 years on, if there are ways to do the above without a for loop? I imagine this would have been made more concise by data.table team? Thanks. – info_seekeR Jan 14 '16 at 13:18
  • 1
    @info_seekeR I don't know of a more concise way – eddi Jan 14 '16 at 15:43
  • 1
    this is a better solution than the selected answer by flodel. Flodel's approach uses the assignment operator <- and therefore involves unnecessary data copying. – Michael Dec 19 '18 at 00:12
  • @MichaelChirico In the first part of your comment, did you added the step `out <- x` to avoid missunderstanding with the x data.frame from the question ? Otherwise this is an even shorter command: `y[, (cols):=lapply(.SD, function(i){i[is.na(i)] <- 0; i}), .SDcols = cols]` skipping the 'out' variable name and use of 'x'. – Yoann Pageaud Jun 15 '20 at 14:41
  • @MichaelChirico True ! I totally forgot about nafill() – Yoann Pageaud Jun 15 '20 at 22:56
  • ```set``` is great because you can also replace strings (not supported yet in nafill) – Gooze Aug 14 '20 at 13:49
21

This is now trivial in tidyr with replace_na(). The function appears to work for data.tables as well as data.frames:

tidyr::replace_na(x, list(a=0, b=0))
ah bon
  • 9,293
  • 12
  • 65
  • 148
Robert McDonald
  • 1,250
  • 1
  • 12
  • 20
3

Not sure if this is more concise, but this function will also find and allow replacement of NAs (or any value you like) in selected columns of a data.table:

update.mat <- function(dt, cols, criteria) {
  require(data.table)
  x <- as.data.frame(which(criteria==TRUE, arr.ind = TRUE))
  y <- as.matrix(subset(x, x$col %in% which((names(dt) %in% cols), arr.ind = TRUE)))
  y
}

To apply it:

y[update.mat(y, c("a", "b"), is.na(y))] <- 0

The function creates a matrix of the selected columns and rows (cell coordinates) that meet the input criteria (in this case is.na == TRUE).

Amy M
  • 967
  • 1
  • 9
  • 19
1

We can solve it in data.table way with tidyr::repalce_na function and lapply

library(data.table)
library(tidyr)
setDT(df)
df[,c("a","b","c"):=lapply(.SD,function(x) replace_na(x,0)),.SDcols=c("a","b","c")]

In this way, we can also solve paste columns with NA string. First, we replace_na(x,""),then we can use stringr::str_c to combine columns!

sbha
  • 9,802
  • 2
  • 74
  • 62
young Chen
  • 27
  • 3
  • 2
    Thank you for this code snippet, which might provide some limited, immediate help. A [proper explanation](https://meta.stackexchange.com/q/114762/349538) would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you’ve made. – CertainPerformance Sep 01 '19 at 14:50
1

Starting from the data.table y, you can just write:
y[, (cols):=lapply(.SD, function(i){i[is.na(i)] <- 0; i}), .SDcols = cols]
Don't forget to library(data.table) before creating y and running this command.

Yoann Pageaud
  • 412
  • 5
  • 22
1

This needed a bit extra for dealing with NA's in factors.

Found a useful function here, which you can then use with mutate_at or mutate_if:

replace_factor_na <- function(x){
    x <- as.character(x)
    x <- if_else(is.na(x), 'NONE', x)
    x <- as.factor(x)
}

df <- df %>%
    mutate_at(
        vars(vector_of_column_names), 
        replace_factor_na
    )

Or apply to all factor columns:

df <- df %>%
  mutate_if(is.factor, replace_factor_na)
0

For a specific column, there is an alternative with sapply

DF <- data.frame(A = letters[1:5],
             B = letters[6:10],
             C = c(2, 5, NA, 8, NA))

DF_NEW <- sapply(seq(1, nrow(DF)),
                    function(i) ifelse(is.na(DF[i,3]) ==
                                       TRUE,
                                       0,
                                       DF[i,3]))

DF[,3] <- DF_NEW
DF
Rafa
  • 359
  • 6
  • 5
0

it's quite handy with data.table and stringr

library(data.table)
library(stringr)

x[, lapply(.SD, function(xx) {str_replace_na(xx, 0)})]

FYI

micstr
  • 5,080
  • 8
  • 48
  • 76
Grec001
  • 1,111
  • 6
  • 20
0

For completeness, built upon @sbha's answer, here is the tidyverse version with the across() function that's available in dplyr since version 1.0 (which supersedes the *_at() variants, and others):

# random data
set.seed(1234)
x <- data.frame(a = sample(c(1, 2, NA), 10, replace = T),
                b = sample(c(1, 2, NA), 10, replace = T), 
                c = sample(c(1:5, NA), 10, replace = T))
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
# with the magrittr pipe
x %>% mutate(across(1:2, ~ replace_na(.x, 0)))
#>    a b  c
#> 1  2 2  5
#> 2  2 2  2
#> 3  1 0  5
#> 4  0 2  2
#> 5  1 2 NA
#> 6  1 2  3
#> 7  2 2  4
#> 8  2 1  4
#> 9  0 0  3
#> 10 2 0  1
# with the native pipe (since R 4.1)
x |> mutate(across(1:2, ~ replace_na(.x, 0)))
#>    a b  c
#> 1  2 2  5
#> 2  2 2  2
#> 3  1 0  5
#> 4  0 2  2
#> 5  1 2 NA
#> 6  1 2  3
#> 7  2 2  4
#> 8  2 1  4
#> 9  0 0  3
#> 10 2 0  1

Created on 2021-12-08 by the reprex package (v2.0.1)

stragu
  • 1,051
  • 9
  • 15
-4

this works fine for me

DataTable DT = new DataTable();

DT = DT.AsEnumerable().Select(R =>
{
      R["Campo1"] = valor;
      return (R);
}).ToArray().CopyToDataTable();