10

I want to replace 0 by NA in each column of a data.table.

library(data.table)
dt1 <- data.table(V1=0:2, V2=2:0)
dt1

   V1 V2
1:  0  2
2:  1  1
3:  2  0

dt1==0
       V1    V2
[1,]  TRUE FALSE
[2,] FALSE FALSE
[3,] FALSE  TRUE

I tried this one

dt1[dt1==0] 
Error in `[.data.table`(dt1, dt1 == 0) : 
  i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). Please let datatable-help know if you'd like this, or add your comments to FR #1611.

And also tried this one

dt1[dt1==0, .SD :=NA] 

Edited

Partial sessionInfo()

R version 3.2.1 (2015-06-18)
Platform: i686-pc-linux-gnu (32-bit)
Running under: Ubuntu 14.04.2 LTS

data.table_1.9.4
halfer
  • 19,824
  • 17
  • 99
  • 186
MYaseen208
  • 22,666
  • 37
  • 165
  • 309

2 Answers2

20

You can try set for multiple columns. It will be faster as the overhead of .[data.table is avoided

for(j in seq_along(dt1)){
         set(dt1, i=which(dt1[[j]]==0), j=j, value=NA)
}
dt1
#   V1 V2
#1: NA  2
#2:  1  1
#3:  2 NA

Or another option would be looping with lapply and then change the 0 values to NA with replace

dt1[, lapply(.SD, function(x) replace(x, which(x==0), NA))]

Or we can make use of some arthithmetic operations to convert the 0 value to NA.

 dt1[, lapply(.SD, function(x) (NA^!x) *x)]

The way (NA^!x)*x this works is by converting the !x i.e. a logical TRUE/FALSE vector for each column (where TRUE corresponds to 0 value) to NA and 1 by doing NA^!x. We multiply with the x value to replace the 1 with the x value corresponding to it while the NA will remain as such.

Or a syntax similar to base R would be

  is.na(dt1) <- dt1==0

But this method may not be that efficient for large data.table as dt1==0 would be a logical matrix and also as @Roland mentioned in the comments that the dataset would be copied. I would either use the lapply based or the more efficient set for larger datasets.

akrun
  • 874,273
  • 37
  • 540
  • 662
3

dt1[dt1==0] <- NA worked for me.

dt1[dt1==0] <- NA
dt1
##   V1 V2
##1: NA  2
##2:  1  1
##3:  2 NA

As noted by Roland, this does make a copy of the data.table object, and will be slower.

alexforrence
  • 2,724
  • 2
  • 27
  • 30