I'm trying to merge a bunch of overlapping time periods in R using a data.table. I've got a call to foverlap of the table on itself, which is efficient enough.
My problem is such: say period A overlaps period B, and B overlaps period C, but A does not overlap C. In that case A is not grouped with C, and they will eventually have to be merged.
Currently I've got a while loop finding overlaps and merging until no more merges occur, but that's not exactly scalable. One solution I can see is applying the indices of the groups recursively to itself until it stabilises, but that still looks to need a loop, and I want a completely vectorised solution.
dt = data.table(start = c(1,2,4,6,8,10),
end = c(2,3,6,8,10,12))
setkeyv(dt,c("start","end"))
f = foverlaps(dt,
dt,
type="any",
mult="first",
which="TRUE")
#Needs to return [1,1,3,3,3,3]
print(f)
#1 1 3 3 4 5
print(f[f])
#1 1 3 3 3 4
print(f[f][f])
#1 1 3 3 3 3
Can anyone help me with some ideas on vectorising this procedure?
Edit with IDs:
dt = data.table(id = c('A','A','A','A','A','B','B','B'),
eventStart = c(1,2,4,6,8,10,11,15),
eventEnd = c(2,3,6,8,10,12,14,16))
setkeyv(dt,c("id","eventStart","eventEnd"))
f = foverlaps(dt,
dt,
type="any",
mult="first",
which="TRUE")
#Needs to return [1 1 3 3 3 6 6 8] or similar