2

I want to compose code that will replace NAs with 0 in all numeric columns using data.table syntax.

My code is the following:

dt <- data.table(a = c(1:3,NA, NA, NA, 10:12 ), b = c(NA, NA, NA, 20:25), c = c(letters[1 : 7], NA, NA) )

> dt
    a  b  c
1:  1 NA  a
2:  2 NA  b
3:  3 NA  c
4: NA 20  d
5: NA 21  e
6: NA 22  f
7: 10 23  g
8: 11 24 NA
9: 12 25 NA


needed_names <- names(dt)[sapply(dt, is.numeric)]

dt_ <- dt[, lapply(.SD, function(x){if(is.na(x)) 0 else x}), .SDcols = needed_names] 

> dt_
    a b
1:  1 0
2:  2 0
3:  3 0
4: NA 0
5: NA 0
6: NA 0
7: 10 0
8: 11 0
9: 12 0

Could you tell me why my code is not working and what I should do to correct it?

Your advice will be appreciated.

am7
  • 51
  • 6

2 Answers2

0

Alternatively:

num_cols <- sapply(dt, is.numeric)
dt2 <- dt[,num_cols,with=F]
dt <- cbind(dt[,!num_cols, with=F],dt2)
Osdorp
  • 190
  • 7
-1

We can do this with set by looping over the numeric columns (needed_names) and set the elements that are NA specified in i to 0

for(j in needed_names){
   set(dt, i = which(is.na(dt[[j]])), j=j, value = 0)
}
dt
#    a  b  c
#1:  1  0  a
#2:  2  0  b
#3:  3  0  c
#4:  0 20  d
#5:  0 21  e
#6:  0 22  f
#7: 10 23  g
#8: 11 24 NA
#9: 12 25 NA

Regarding the OP's code, when there are more than one element, we use ifelse or replace and also the output should be assigned back to the columns of interest otherwise, we will only the columns specified in the .SDcols and will not be updated in the original dataset

dt[, (needed_names) := lapply(.SD, function(x) 
          replace(x, is.na(x), 0)), .SDcols = needed_names] 
akrun
  • 874,273
  • 37
  • 540
  • 662