7

I have number of intervals and need to find which ones would form a continous group.

In this MWE, I have Interval.id, Interval.start, and Interval.end. And I want to calculate Wanted.column.

DT <- data.table(Interval.id=c(1L, 2L, 3L, 4L, 5L, 6L),
                 Interval.start=c(2.0, 3.0, 4.0, 4.6, 4.7, 5.5),
                 Interval.end=c(4.5, 3.5, 4.8, 5.0, 4.9, 8.0),
                 Wanted.column=c(1L, 1L, 1L, 1L, 1L, 2L))

I suppose foverlaps is the friend here, but I can't see how.

How can Wanted.column be calculated?

Henrik
  • 65,555
  • 14
  • 143
  • 159
Chris
  • 2,256
  • 1
  • 19
  • 41
  • 1
    How do you define a continuous group? Could elaborate on how you achieve wanted column? Can't straight away see a sequence. – NelsonGon Sep 29 '19 at 06:55
  • 1
    Not sure how to say it, but if interval A intersect with B, and C with B, then A,B, C is the same group. Even if A and C do not intersect. – Chris Sep 29 '19 at 06:57
  • see also: https://stackoverflow.com/questions/55836442/split-overlapping-intervals-into-non-overlapping-intervals-within-values-of-an Given the split-up data that this answer asks for it should be trivial to find unions by comparing interval ends with subsequent interval starts (e.g. using `data.table::shift`) – Michael Sep 29 '19 at 20:38

2 Answers2

5
DT[ , g := cumsum(
  cummax(shift(Interval.end, fill = Interval.end[1])) < Interval.start) + 1]

#    Interval.id Interval.start Interval.end Wanted.column   g
# 1:           1            2.0          4.5             1   1
# 2:           2            3.0          3.5             1   1
# 3:           3            4.0          4.8             1   1
# 4:           4            4.6          5.0             1   1
# 5:           5            4.7          4.9             1   1
# 6:           6            5.5          8.0             2   2

Credit to highly related answers: Collapse rows with overlapping ranges, How to flatten / merge overlapping time periods

Henrik
  • 65,555
  • 14
  • 143
  • 159
3

You can first create a data.table with the unique/grouped intervals, and then use foverlaps() to perform a join. The main-interval data.table can be created using the intervals-package. Use the interval_union()-function to 'merge' intervals into non-overlapping inertvals.

#use the intervals-package to create the "main" unique intervals
library( intervals )
DT.int <- as.data.table(
  intervals::interval_union( 
    intervals::Intervals( as.matrix( DT[, 2:3] ) ) , 
    check_valid = TRUE ) )
#set names
setnames( DT.int, names(DT.int), c("start", "end" ) )
#set group_id-column
DT.int[, group_id := .I ][]
#    start end group_id
# 1:   2.0   5        1
# 2:   5.5   8        2

#now perform foverlaps()
setkey( DT, Interval.start, Interval.end)
setkey( DT.int, start, end)
foverlaps( DT.int, DT )

#    Interval.id Interval.start Interval.end Wanted.column start end group_id
# 1:           1            2.0          4.5             1   2.0   5        1
# 2:           2            3.0          3.5             1   2.0   5        1
# 3:           3            4.0          4.8             1   2.0   5        1
# 4:           4            4.6          5.0             1   2.0   5        1
# 5:           5            4.7          4.9             1   2.0   5        1
# 6:           6            5.5          8.0             2   5.5   8        2

As you can see, the column group_id matches your Wanted.column

Wimpel
  • 26,031
  • 1
  • 20
  • 37