10

Assume the following datatable:

DT <- data.table(a=c(1,2,3,4,5,6),b=c(NaN,NaN,NaN,4,5,6),c=c(NaN,3,3,3,NaN,NaN))

How can I replace all NaN values by NA, i.e. in each column? Looking for an elegant (short) way to do this, but I am also wondering if it is possible using lapply, like I tried below.

My approach so far:

DT[,lapply(SD,function(x){x[is.nan(x)] := NA}),.SDcols=c("a","b","c")]  

Actual result of my code is:

Error in :=(x[is.nan(x)], NA) : Check that is.data.table(DT) == TRUE. Otherwise, := and :=(...) are defined for use in j, once only and in particular ways. See help(":=").

markus
  • 25,843
  • 5
  • 39
  • 58
User878239
  • 649
  • 1
  • 7
  • 14
  • 1
    See end of Matt Dowle's answer here: https://stackoverflow.com/a/7249454/8583393 You would need `for(j in names(DT)) { set(DT, which(is.nan(DT[[j]])), j, NA) }` – markus Jan 09 '19 at 21:21
  • @markus thanks, that's it. I didn't see this post before, thanks. Need to use a good old for loop with `set` it seems :) – User878239 Jan 09 '19 at 21:25
  • 1
    Glad it worked. `set` is actually super fast so don't get 'blinded' by the loop. – markus Jan 09 '19 at 21:26

2 Answers2

20

You can do it easily using dplyr's mutate_all function.

DT <- DT %>% mutate_all(~ifelse(is.nan(.), NA, .))
print(DT)
#a  b  c
# 1 NA NA
# 2 NA  3
# 3 NA  3
# 4  4  3
# 5  5 NA
# 6  6 NA

The code above is equivalent to

DT %>% mutate_all(function(x) ifelse(is.nan(x), NA, x))

Here's one way to do it using data.table syntax. I'm not sure if there is a simpler way.

DT[, names(DT) := lapply(.SD, function(x) ifelse(is.nan(x), NA, x))]

EDIT: An alternative way to do this, as posted by markus in the comments beow.

DT[, lapply(.SD, function(x) replace(x, is.nan(x), NA))]
Dave Rosenman
  • 1,252
  • 9
  • 13
  • 1
    thanks, but isn't that `dplyr` syntax? I'd like to stick to datatable if possible just for one operation. – User878239 Jan 09 '19 at 21:19
  • I edited my response with a way to do it using data.table syntax. I mostly use dplyr, so I'm not sure if there's a simpler way to do it using data.table than the way I posted. – Dave Rosenman Jan 09 '19 at 21:27
  • thanks, this new approach works, but I also want to point out to markus solution in the comments. I will accept it for your efforts and because it works. – User878239 Jan 09 '19 at 21:43
  • 1
    @DavidRosenman The advantage of using `set` is that it does the replacement [by reference](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reference-semantics.html). With your solution you'd need to do `DT <- DT[, ...]`. Also you could avoid `ifelse` and try `replace` instead `DT <- DT[, lapply(.SD, function(x) replace(x, is.nan(x), NA))]`. Best – markus Jan 09 '19 at 21:56
  • 1
    `DT %>% mutate(across(everything(), ~ifelse(is.nan(.), NA, .)))` is how you can do it with `across()` that [superseded](https://dplyr.tidyverse.org/reference/across.html#:~:text=across()%20supersedes%20the%20family%20of%20%22scoped%20variants%22%20like%20summarise_at()%2C%20summarise_if()%2C%20and%20summarise_all().) `mutate_all()` – Paul Schmidt Feb 16 '23 at 15:13
  • The `mutate_all` method changes `POSIXct` values to integers. I can't figure the `lapply` approaches because they give `Error: object '.SD' not found.` Is there a working way to replace `NaN` with `NA` or `NULL` that doesn't change data types? – MikeB Aug 16 '23 at 22:52
1

You want something like this:

DT[DT == "NaN"] <- NA

This should ideally find all the NaN that you have and replace it with NA. If anyone knows a better way please let me know!

Hope this helps!

Lasarus9
  • 83
  • 9
  • 2
    thanks, but my NaNs are not characters so your code does not work: Error in `[.data.table`(DT, DT == "NaN") : i is invalid type (matrix) – User878239 Jan 09 '19 at 21:17
  • 3
    In addition, NaN does not equate to true with any value, including with itself. See `is.nan` for the proper test. Otherwise, you have the right idea. – Matthew Lundberg Jan 09 '19 at 23:25