How to replace NA values in a table for selected columns

Question

There are a lot of posts about replacing NA values. I am aware that one could replace NAs in the following table/frame with the following:

x[is.na(x)]<-0

But, what if I want to restrict it to only certain columns? Let's me show you an example.

First, let's start with a dataset.

set.seed(1234)
x <- data.frame(a=sample(c(1,2,NA), 10, replace=T),
                b=sample(c(1,2,NA), 10, replace=T), 
                c=sample(c(1:5,NA), 10, replace=T))

Which gives:

    a  b  c
1   1 NA  2
2   2  2  2
3   2  1  1
4   2 NA  1
5  NA  1  2
6   2 NA  5
7   1  1  4
8   1  1 NA
9   2  1  5
10  2  1  1

Ok, so I only want to restrict the replacement to columns 'a' and 'b'. My attempt was:

x[is.na(x), 1:2]<-0

and:

x[is.na(x[1:2])]<-0

Which does not work.

My data.table attempt, where y<-data.table(x), was obviously never going to work:

y[is.na(y[,list(a,b)]), ]

I want to pass columns inside the is.na argument but that obviously wouldn't work.

I would like to do this in a data.frame and a data.table. My end goal is to recode the 1:2 to 0:1 in 'a' and 'b' while keeping 'c' the way it is, since it is not a logical variable. I have a bunch of columns so I don't want to do it one by one. And, I'd just like to know how to do this.

Do you have any suggestions?

score 139 · Accepted Answer · answered Oct 15 '13 at 10:52

139

You can do:

x[, 1:2][is.na(x[, 1:2])] <- 0

or better (IMHO), use the variable names:

x[c("a", "b")][is.na(x[c("a", "b")])] <- 0

In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector.

answered Oct 15 '13 at 10:52

flodel

87,577
21
185
223

That does the job. What about if I want to search for '1'? I tried to change it around but I couldn't get it to work. – jnam27 Oct 15 '13 at 11:07
6

Probably like this: `x[, 1:2][x[, 1:2] == 1] <- 0` – flodel Oct 15 '13 at 11:08
@flodel why does the datatable `x` accept a matrix as its first member only when doing assignation ? Is this feature documented somewhere ? Also I think you forgot to put a comma before the vectors with column names in you second example. – ChiseledAbs Dec 09 '16 at 02:35
@ChiseledAbs, I think you are referring to matrix indexing (see this for example http://stackoverflow.com/a/13999583/1201032), but it is not limited to assignments, it can also be used to extract data. Regarding the missing comma: no. Data.frames are lists of columns so if you use a single argument to `[`, it will extract the specified columns (see http://stackoverflow.com/a/21137524/1201032). I hope this answers your question but in the future, please avoid commenting on very old answers like this one; instead post a new question. – flodel Dec 09 '16 at 23:38
1

`In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector.` When I used a predefined vector like this `x[Vpredefined][is.na(x[Vpredefined])] <- 0` it gives me error – Rohit Saluja Feb 28 '18 at 12:37
@RohitSaluja, maybe your `x` is a matrix and not a data.frame like was the case in this question? – flodel Mar 03 '18 at 13:07
My x is a data.table – Rohit Saluja Mar 05 '18 at 07:33
1

@RohitSaluja is correct, using a predefined vector, this approach doesn't work. One can do x[,..Vpredefined], but the second call does not work. – hrrrrrr5602 Jan 15 '21 at 17:37
May i ask how can we do that in pipe `%>%`? – ah bon Oct 19 '21 at 01:55

score 49 · Answer 2 · answered Mar 22 '19 at 02:23

49

Building on @Robert McDonald's tidyr::replace_na() answer, here are some dplyr options for controlling which columns the NAs are replaced:

library(tidyverse)

# by column type:
x %>%
  mutate_if(is.numeric, ~replace_na(., 0))

# select columns defined in vars(col1, col2, ...):
x %>%
  mutate_at(vars(a, b, c), ~replace_na(., 0))

# all columns:
x %>%
  mutate_all(~replace_na(., 0))

answered Mar 22 '19 at 02:23

sbha

9,802
2
74
62

1

With this function I get the Error: `Error in replace_na(., 0) : argument "value" is missing, with no default`. Any suggestions what to change? – Tim M. Schendzielorz Apr 05 '19 at 09:58

score 42 · Answer 3 · edited Jun 15 '20 at 15:14

42

Edit 2020-06-15

Since data.table 1.12.4 (Oct 2019), data.table gains two functions to facilitate this: nafill and setnafill.

nafill operates on columns:

cols = c('a', 'b')
y[ , (cols) := lapply(.SD, nafill, fill=0), .SDcols = cols]

setnafill operates on tables (the replacements happen by-reference/in-place)

setnafill(y, cols=cols, fill=0)
# print y to show the effect
y[]

This will also be more efficient than the other options; see ?nafill for more, the last-observation-carried-forward (LOCF) and next-observation-carried-backward (NOCB) versions of NA imputation for time series.

This will work for your data.table version:

for (col in c("a", "b")) y[is.na(get(col)), (col) := 0]

Alternatively, as David Arenburg points out below, you can use set (side benefit - you can use it either on data.frame or data.table):

for (col in 1:2) set(x, which(is.na(x[[col]])), col, 0)

edited Jun 15 '20 at 15:14

MichaelChirico

33,841
14
113
198

answered Oct 15 '13 at 15:07

eddi

49,088
6
104
155

thanks for this. Just wanted to know, 3 years on, if there are ways to do the above without a for loop? I imagine this would have been made more concise by data.table team? Thanks. – info_seekeR Jan 14 '16 at 13:18
1

@info_seekeR I don't know of a more concise way – eddi Jan 14 '16 at 15:43
1

this is a better solution than the selected answer by flodel. Flodel's approach uses the assignment operator <- and therefore involves unnecessary data copying. – Michael Dec 19 '18 at 00:12
@MichaelChirico In the first part of your comment, did you added the step `out <- x` to avoid missunderstanding with the x data.frame from the question ? Otherwise this is an even shorter command: `y[, (cols):=lapply(.SD, function(i){i[is.na(i)] <- 0; i}), .SDcols = cols]` skipping the 'out' variable name and use of 'x'. – Yoann Pageaud Jun 15 '20 at 14:41
@MichaelChirico True ! I totally forgot about nafill() – Yoann Pageaud Jun 15 '20 at 22:56
```set``` is great because you can also replace strings (not supported yet in nafill) – Gooze Aug 14 '20 at 13:49

score 21 · Answer 4 · edited Oct 19 '21 at 01:55

21

This is now trivial in tidyr with replace_na(). The function appears to work for data.tables as well as data.frames:

tidyr::replace_na(x, list(a=0, b=0))

edited Oct 19 '21 at 01:55

ah bon

9,293
12
65
148

answered Dec 09 '17 at 14:54

Robert McDonald

1,250
1
12
20

score 3 · Answer 5 · answered Jan 19 '16 at 21:36

Not sure if this is more concise, but this function will also find and allow replacement of NAs (or any value you like) in selected columns of a data.table:

update.mat <- function(dt, cols, criteria) {
  require(data.table)
  x <- as.data.frame(which(criteria==TRUE, arr.ind = TRUE))
  y <- as.matrix(subset(x, x$col %in% which((names(dt) %in% cols), arr.ind = TRUE)))
  y
}

To apply it:

y[update.mat(y, c("a", "b"), is.na(y))] <- 0

The function creates a matrix of the selected columns and rows (cell coordinates) that meet the input criteria (in this case is.na == TRUE).

score 1 · Answer 6 · edited Nov 07 '19 at 21:18

1

We can solve it in data.table way with tidyr::repalce_na function and lapply

library(data.table)
library(tidyr)
setDT(df)
df[,c("a","b","c"):=lapply(.SD,function(x) replace_na(x,0)),.SDcols=c("a","b","c")]

In this way, we can also solve paste columns with NA string. First, we replace_na(x,""),then we can use stringr::str_c to combine columns!

edited Nov 07 '19 at 21:18

sbha

9,802
2
74
62

answered Sep 01 '19 at 11:26

young Chen

27
3

2

Thank you for this code snippet, which might provide some limited, immediate help. A [proper explanation](https://meta.stackexchange.com/q/114762/349538) would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you’ve made. – CertainPerformance Sep 01 '19 at 14:50

Yoann Pageaud · Answer 7 · 2020-06-15T14:40:16.037

1

Starting from the data.table y, you can just write:
y[, (cols):=lapply(.SD, function(i){i[is.na(i)] <- 0; i}), .SDcols = cols]
Don't forget to library(data.table) before creating y and running this command.

edited Jun 15 '20 at 14:40

answered Jun 15 '20 at 14:34

Yoann Pageaud

412
5
22

score 1 · Answer 8 · answered Mar 12 '22 at 10:57

This needed a bit extra for dealing with NA's in factors.

Found a useful function here, which you can then use with mutate_at or mutate_if:

replace_factor_na <- function(x){
    x <- as.character(x)
    x <- if_else(is.na(x), 'NONE', x)
    x <- as.factor(x)
}

df <- df %>%
    mutate_at(
        vars(vector_of_column_names), 
        replace_factor_na
    )

Or apply to all factor columns:

df <- df %>%
  mutate_if(is.factor, replace_factor_na)

score 0 · Answer 9 · answered Sep 24 '16 at 20:52

For a specific column, there is an alternative with sapply

DF <- data.frame(A = letters[1:5],
             B = letters[6:10],
             C = c(2, 5, NA, 8, NA))

DF_NEW <- sapply(seq(1, nrow(DF)),
                    function(i) ifelse(is.na(DF[i,3]) ==
                                       TRUE,
                                       0,
                                       DF[i,3]))

DF[,3] <- DF_NEW
DF

score 0 · Answer 10 · edited Mar 06 '22 at 19:28

0

it's quite handy with data.table and stringr

library(data.table)
library(stringr)

x[, lapply(.SD, function(xx) {str_replace_na(xx, 0)})]

FYI

edited Mar 06 '22 at 19:28

micstr

5,080
8
48
76

answered Sep 10 '19 at 19:56

Grec001

1,111
6
20

stragu · Answer 11 · 2021-12-08T08:47:45.213

For completeness, built upon @sbha's answer, here is the tidyverse version with the across() function that's available in dplyr since version 1.0 (which supersedes the *_at() variants, and others):

# random data
set.seed(1234)
x <- data.frame(a = sample(c(1, 2, NA), 10, replace = T),
                b = sample(c(1, 2, NA), 10, replace = T), 
                c = sample(c(1:5, NA), 10, replace = T))
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
# with the magrittr pipe
x %>% mutate(across(1:2, ~ replace_na(.x, 0)))
#>    a b  c
#> 1  2 2  5
#> 2  2 2  2
#> 3  1 0  5
#> 4  0 2  2
#> 5  1 2 NA
#> 6  1 2  3
#> 7  2 2  4
#> 8  2 1  4
#> 9  0 0  3
#> 10 2 0  1
# with the native pipe (since R 4.1)
x |> mutate(across(1:2, ~ replace_na(.x, 0)))
#>    a b  c
#> 1  2 2  5
#> 2  2 2  2
#> 3  1 0  5
#> 4  0 2  2
#> 5  1 2 NA
#> 6  1 2  3
#> 7  2 2  4
#> 8  2 1  4
#> 9  0 0  3
#> 10 2 0  1

^{Created on 2021-12-08 by the reprex package (v2.0.1)}

ah, maybe because I was applying it to columns 2:3 instead of 1:2...? Fixed now. — stragu, Dec 08 '21 at 08:48

score -4 · Answer 12 · answered Jul 27 '16 at 22:41

-4

this works fine for me

DataTable DT = new DataTable();

DT = DT.AsEnumerable().Select(R =>
{
      R["Campo1"] = valor;
      return (R);
}).ToArray().CopyToDataTable();

answered Jul 27 '16 at 22:41

Juanico Lasa

1

1

is this R? looks like C# – Chris McKelt Nov 09 '16 at 12:41

How to replace NA values in a table for selected columns

12 Answers12

Edit 2020-06-15

Linked

Related