How to combine intervals data into fewer intervals in R?

Question

I am trying to collapse a series of intervals into fewer, equally meaningful intervals.

Consider for example this list of intervals

Intervals = list(
  c(23,34),
  c(45,48),
  c(31,35),
  c(7,16),
  c(5,9),
  c(56,57),
  c(55,58)
)

Because the intervals overlap, the same intervals can be described with few vectors. Plotting these intervals make obvious that a list of 4 vectors would be enough

plot(1,1,type="n",xlim=range(unlist(Intervals)),ylim=c(0.9,1.1))
segments(
    x0=sapply(Intervals,"[",1),
    x1=sapply(Intervals,"[",2),
    y0=rep(1,length(Intervals)),
    y1=rep(1,length(Intervals)),
    lwd=10
    )

How can I reduce my Intervals list to carry the same info than the one displayed on the plot? (performance matter)

The desired outputs for the above example is

Intervals = list(
  c(5,16)
  c(23,35),
  c(45,48),
  c(55,58)
)

Not the same, but _possibly_ some relevant pieces in [this Q&A](http://stackoverflow.com/questions/27574775/is-it-possible-to-use-the-r-data-table-funcion-foverlaps-to-find-the-intersectio) — Henrik, Jan 26 '16 at 08:00
@sindri_baldur Feel free to vote to rollback, but Remi's field is genetics, and accepted solution from Ven (also genetics) is from bioconductor IRranges package. Plus I came across to this post form other bioinformatics related post. — zx8754, Jun 26 '19 at 10:22

score 4 · Accepted Answer · answered Jan 26 '16 at 03:41

4

What you need is the reduce function in the IRanges package.

In.df <- do.call(rbind, Intervals)
library(IRanges)

In.ir <- IRanges(In.df[, 1], In.df[,2])

out.ir <- reduce(In.ir)
out.ir
# IRanges of length 4
#     start end width
# [1]     5  16    12
# [2]    23  35    13
# [3]    45  48     4
# [4]    55  58     4

answered Jan 26 '16 at 03:41

Ven Yao

3,680
2
27
42

IRanges seems like a very handy packages for my need. Thanks a lot! +1 – Remi.b Jan 26 '16 at 16:53

score 0 · Answer 2 · answered Jan 26 '16 at 04:07

One option with base R:

First I put your list in a data.frame:

ints <- as.data.frame(do.call(rbind, Intervals))
names(ints) <- c('start', 'stop')

so it looks like

  start stop
1    23   34
2    45   48
3    31   35
4     7   16
5     5    9
6    56   57
7    55   58

Now, two for loops compare with between, and expand an interval when a crossover is found:

for(x in 1:nrow(ints)){
  for(y in 1:nrow(ints)){
    if(between(ints$start[x], ints$start[y], ints$stop[y])){
      ints$start[x] <- ints$start[y]
      if(ints$stop[y] > ints$stop[x]){
        ints$stop[x] <- ints$stop[y]
      } else {
        ints$stop[y] <- ints$stop[x]
      }
    }
  }
}

which alters ints to

> ints
  start stop
1    23   35
2    45   48
3    23   35
4     5   16
5     5   16
6    55   58
7    55   58

Simplify to unique cases:

ints <- unique(ints, margin = 1)

and put them in order

ints <- ints[order(ints$start),]

which leaves you with

> ints
  start stop
4     5   16
1    23   35
2    45   48
6    55   58

If you want it back in a list like the original,

Intervals <- lapply(1:nrow(ints), function(x)c(ints[x,1], ints[x,2]))

(Note: You can certainly do this with *apply instead of for, Booleans instead of between, and the original list instead of a data.frame, but, well, this is readable. Rewrite/optimize as you like.)

Side note: `stop` is probably a bad name for a variable, but I got too far in to change it. Sorry! — alistaire, Jan 26 '16 at 04:10
Your comment is totally unclear. `stop` is not a variable, it is the name of a column. — , Jan 26 '16 at 04:26
From `?data.frame`: "A data frame is a list of variables of the same number of rows with unique row names, given class "data.frame"." That said, `stop` is a bad name regardless. — alistaire, Jan 26 '16 at 04:35

score 0 · Answer 3 · answered Mar 06 '23 at 21:13

With ivs and iv_groups() for merging overlapping intervals within a single interval vector

library(ivs)
library(purrr)

x = list(
  c(23,34),
  c(45,48),
  c(31,35),
  c(7,16),
  c(5,9),
  c(56,57),
  c(55,58)
)

x <- list_transpose(x)
x
#> [[1]]
#> [1] 23 45 31  7  5 56 55
#> 
#> [[2]]
#> [1] 34 48 35 16  9 57 58

x <- iv(x[[1]], x[[2]])
x
#> <iv<double>[7]>
#> [1] [23, 34) [45, 48) [31, 35) [7, 16)  [5, 9)   [56, 57) [55, 58)

iv_groups(x)
#> <iv<double>[4]>
#> [1] [5, 16)  [23, 35) [45, 48) [55, 58)

The benefit of this over the great IRanges package is that it also works natively on any vector type supported in the tidyverse, like dates.

score 0 · Answer 4 · answered Apr 21 '23 at 06:51

I use following function to merge intervals

merge_intervals <- function(start, end) {
    order_idx <- order(start, end)
    start <- start[order_idx]
    end <- end[order_idx]
    len <- length(start)
    if (len >= 2L) {
        groups <- cumsum(c(0L, end[1:(len - 1L)] < start[-1L]))
        if (anyDuplicated(groups)) {
            groups <- factor(groups)
            start <- vapply(split(start, groups), min,
                numeric(1L),
                USE.NAMES = FALSE
            )
            end <- vapply(split(end, groups), max,
                numeric(1L),
                USE.NAMES = FALSE
            )
            Recall(start, end)
        } else {
            list(start, end)
        }
    } else {
        list(start, end)
    }
}

The output corresponding the start and end of the intervals

    x <- list(
        c(23, 34),
        c(45, 48),
        c(31, 35),
        c(7, 16),
        c(5, 9),
        c(56, 57),
        c(55, 58)
    )
    x <- data.table::transpose(x)
    merge_intervals(x[[1L]], x[[2L]])
#> [[1]]
#> [1]  5 23 45 55
#> 
#> [[2]]
#> [1] 16 35 48 58

How to combine intervals data into fewer intervals in R?

4 Answers4

Linked