I'm running into an issue when doin grouping and which.max with R data.table, and I'm not sure whether it's a bug, or I'm not understanding the group-by structures in data.table correctly. I have a work-around, I'm just trying to understand why my initial try failed.
I'm looking at a table containing time series, and I want to get either (a) the time an event of interest occurred, or (b) the final time stamp in the time series. The column marking events is "NA" if an event did not occur, and "1" if it did.
Here's a minimal example to reproduce the issue:
dt <- data.table(t = seq(9), event = c(NA, NA, NA, NA, 1, NA, 1, NA, NA), t_id = c(rep('A', 3), rep('B', 3), rep('C', 3)))
dt[, ifelse(is.null(which.max(event)), max(t), t[which.max(event)]), by=t_id]
This returns
t_id V1
A NA
B 5
C 7
Where the value for group "A" is NA (I would naively expect it to be 3). If I run this without the ifelse function
dt[, t[which.max(event)], by=t_id]
the row for "A" is simply missing (which.max returns NULL). But if I run
dt[, is.null(which.max(event)), by=t_id]
I get
t_id V1
A FALSE
B FALSE
C FALSE
What am I missing?