3

[data.table] I have written a function like this to replace NA to 0 if a column is numeric

fn.naremove <- function(data){ 
for (i in 1: length(data)){
if (class(data[[i]]) %in% c("numeric", "interger", "interger64")) {
  print(data[, names(data[, i]) := replace(data[, i], is.na(data[, i]), 0)])
} 
else {
 print(data)
}}}

I have a sample data table like below

dt1<- data.table(C1= c(1, 5, 14, NA, 54), C2= c(9, NA, NA, 3, 42), C3= c(9, 7, 42, 87, NA))

if I use fn.naremove(dt1) it returns the error

Error in `[.data.table`(data, , i) : 
j (the 2nd argument inside [...]) is a single symbol but column name 'i' is not found. 
Perhaps you intended DT[, ..i]. This difference to data.frame is deliberate and explained in FAQ 1.1.

If I run the code with the actual column index, it runs smoothly and returns the result I wanted for column number 1:

dt1[, names(dt1[, 1]) := replace(dt1[, 1], is.na(dt1[, 1]), 0)]

  C1 C2 C3
1:  1  9  9
2:  5 NA  7
3: 14 NA 42
4:  0  3 87
5: 54 42 NA

Please tell me if I miss or did something wrong with my function. Thanks in advance!!

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1darknight
  • 83
  • 8
  • Why are you printing in your function? – Roland May 20 '21 at 05:28
  • I searched through stackoverflow and learned that if you want your code inside to work, you need to put print() to it. Cause you will see the result print out in R console. – 1darknight May 20 '21 at 07:08

2 Answers2

3

You may use replace.

replace(dt1, is.na(dt1), 0)
#    C1 C2 C3
# 1:  1  9  9
# 2:  5  0  7
# 3: 14  0 42
# 4:  0  3 87
# 5: 54 42  0

There's a nice function around that stays in the data.table universe and which we may expand to account for specific classes.

dt1 <- cbind(dt1, x=c("a", NA))  ## add a categorcal variable

library(data.table)
classes <- c("numeric", "interger", "interger64")  ## define sp. classes

fun <- function(DT) {
  for (j in names(DT)) {
    set(DT, which(is.na(DT[[j]]) & class(DT[[j]]) %in% classes), j, 0)
  }
}

fun(dt1)
dt1
#    C1 C2 C3    x
# 1:  1  9  9    a
# 2:  5  0  7 <NA>
# 3: 14  0 42    a
# 4:  0  3 87 <NA>
# 5: 54 42  0    a

Only NA's of defined classes are replaced. This should be most effective since no copies are made.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
2

Note that names(dt1[, 1]) works but when you do -

i <- 1
names(dt1[, i])

It doesn't work and returns an error

Error in [.data.table(dt1, , i) : j (the 2nd argument inside [...]) is a single symbol but column name 'i' is not found. Perhaps you intended DT[, ..i]. This difference to data.frame is deliberate and explained in FAQ 1.1.

The solution is to use ..i i.e names(dt1[, ..i]).


Other option is -

fn.naremove <- function(data){ 
  for (i in 1: length(data)){
    if (class(data[[i]]) %in% c("numeric", "interger", "interger64")) {
      print(data[, names(data)[i] := replace(data[[i]], is.na(data[[i]]), 0)])
    } else {
      print(data)
    }}
}
fn.naremove(dt1)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213