I have a data.table
with two columns: one ID
column and one value
column. I want to split up the table by the ID
column and run a function foo
on the value
column. This works fine as long as foo
does not return NAs. In that case, I get an error that tells me that the types of the groups are not consistent. My assumption is that - since is.logical(NA)
equals TRUE
and is.numeric(NA)
equals FALSE
, data.table
internally assumes that I want to combine logical values with numeric ones and returns an error. However, I find this behavior peculiar. Any comments on that? Do I miss something obvious here or is that indeed intended behavior? If so, a short explanation would be great. (Notice that I do know a work-around: just let foo2
return a complete improbable number and filter for that later. However, this seems bad coding).
Here is the example:
library(data.table)
foo1 <- function(x) {if (mean(x) < 5) {return(1)} else {return(2)}}
foo2 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA)}}
DT <- data.table(ID=rep(c("A", "B"), each=5), value=1:10)
DT[, foo1(value), by=ID] #Works perfectly
ID V1
[1,] A 1
[2,] B 2
DT[, foo2(value), by=ID] #Throws error
Error in `[.data.table`(DT, , foo2(value), by = ID) :
columns of j don't evaluate to consistent types for each group: result for group 2 has column 1 type 'logical' but expecting type 'numeric'