3

I'm having difficulty counting overlaps of intervals as I would expect. Here is an R data.table with intervals defined by start to end:

> library(data.table)
> dt1 = data.table(start=c(1, 5, 3), end=c(10, 15, 8))
> print(dt1)
   start end
1:     1  10
2:     5  15
3:     3   8

Here is how I would consider overlaps for these intervals, from 0 to 20:

[0, 1]: 0 (there are no intervals here)
[1, 3]: 1 (there is only one interval here, from [1, 10])
[3, 5]: 2 (two intervals here, both [1, 10] and [3, 8])
[5, 8]: 3
[8, 10]: 1
[10, 15]: 1
[15, 20]: 0

So, I would like to algorithmically output this. Something like:

   start end  overlaps
1:     0  1   0
2:     1  3   1
3:     3  5   2
4:     5  8   3      
5:     8  10  2      
6:    10  15  1      
7:    15  20  0   

However, I cannot find out how to do this with foverlaps() in R data.table, or the various functions of IRanges.

> setkey(dt1, start, end)
> foverlaps(dt1, dt1, type="any")
   start end i.start i.end
1:     1  10       1    10
2:     3   8       1    10
3:     5  15       1    10
4:     1  10       3     8
5:     3   8       3     8
6:     5  15       3     8
7:     1  10       5    15
8:     3   8       5    15
9:     5  15       5    15
> foverlaps(dt1, dt1, type="within")
   start end i.start i.end
1:     1  10       1    10
2:     1  10       3     8
3:     3   8       3     8
4:     5  15       5    15

Neither of these appears to be relevant in order to calculate overlaps over some interval.

Looking into IRanges also doesn't quite give the expected overlapping interval counts:

> library(IRanges)
> range1
IRanges object with 3 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1        10        10
  [2]         3         8         6
  [3]         5        15        11
> countOverlaps(range1, range1)
[1] 3 3 3
> countOverlaps(range1, range1, type="within")
[1] 1 2 1

How does one calculate overlapping intervals?

ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
  • Can you add a reproducible example? Given dataset doesn't match what you provided in your question description (table2) – pogibas Apr 14 '19 at 20:55
  • @PoGibas Sorry, I don't understand. Which table? There's an error in `dt1`? – ShanZhengYang Apr 14 '19 at 20:56
  • *Here is how I would consider overlaps for these intervals, from 0 to 20* `dt1` has only three intervals – pogibas Apr 14 '19 at 20:58
  • @PoGibas Yes, these three intervals overlapping with each other. There are three intervals in `dt1`, i.e. `[1, 10], [5, 15], [3, 8]`. Does this make sense? – ShanZhengYang Apr 14 '19 at 21:00
  • So what I understand from you question wanted result is `foverlaps(dt1, dt1)[, .N - 1, .(start, end)]` (`-1` to remove overlap with itself) – pogibas Apr 14 '19 at 21:05
  • @PoGibas Are you sure the code snippet is correct? `foverlaps(dt1, dt1)[, .N - 1, .(start, end)]` gives an overlap of `2` for each of the three intervals in `dt1`. I'm trying to decompose all unions of these intervals and count the overlaps, like the example shows. Is my goal clear? – ShanZhengYang Apr 14 '19 at 21:11

1 Answers1

2
> # Where do the 0 and the 20 come from?
> points <- c(0, sort(c(dt1$start, dt1$end)), 20)
> x <- do.call(IRanges,
+              transpose(Map(c, start=head(points, -1), end=tail(points, -1))))
> x
IRanges object with 7 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         0         1         2
  [2]         1         3         3
  [3]         3         5         3
  [4]         5         8         4
  [5]         8        10         3
  [6]        10        15         6
  [7]        15        20         6
> y <- do.call(IRanges, dt1)
> y
IRanges object with 3 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1        10        10
  [2]         3         8         6
  [3]         5        15        11
> countOverlaps(x, y, type="within")
[1] 0 1 2 3 2 1 0

There is a slight difference in the 5th result, but there are indeed 2 overlaps as [8, 10] overlaps with [1, 10] and [5, 15].

d125q
  • 1,666
  • 12
  • 18
  • Thanks for this---I corrected the typo as well. Could you explain a bit could you created `x`? – ShanZhengYang Apr 15 '19 at 00:44
  • 1
    I basically constructed `c(0, 1, 3, 5, 8, 10, 15, 20)` vector out of the `data.table`, then "zipped" this vector with itself, resulting in `c(0, 1), c(1, 3), ..., c(15, 20)`. – d125q Apr 15 '19 at 08:09