1

I have two sets of IRanges to compare. My goal is to get an output that has the position of overlaps if one exists and the offset of the ranges listed as a negative start if they do not overlap. At the very least if I can't get the offset I would want to get a "0" to indicate there is no overlap. For example:

xx<-IRanges(start=c(2,9,19,31,45), end=c(3,11,23,35,49))

        IRanges of length 5
     start end width
[1]      2   3     2
[2]      9  11     3
[3]     19  23     5
[4]     31  35     5
[5]     45  49     5

and

yy<-IRanges(start=c(4,10,19,33,45), end=c(5,13,25,38,48))

IRanges of length 5
     start end width
[1]      4   5     2
[2]     10  13     4
[3]     19  25     7
[4]     33  38     6
[5]     45  48     4

Using findOverlaps + ranges gives me:

> fo <-findOverlaps(xx,yy)
> ranges(fo, xx, yy)
IRanges of length 4
    start end width
[1]    10  11     2
[2]    19  23     5
[3]    33  35     3
[4]    45  48     4

I would like the final output to be a dataframe or something that would look like this:

       start end width
[1]     -1   0     0
[2]     10  11     2
[3]     19  23     5
[4]     33  35     3
[5]     45  48     4

I am able to get the indexes of the ranges that overlap using countOverlaps and the hits object for the comparison using findOverlaps + ranges but am at a loss as to how to combine the results to get the desired output.

user2909302
  • 103
  • 8
  • 1
    It's more likely that we will be able to help you if you make a minimal reproducible example to go along with your question. Something we can work from and use to show you how it might be possible to solve your problem. You can have a look at [this SO post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a great reproducible example in R. – Eric Fail Dec 29 '15 at 19:46
  • Great reproducible example. A spontaneous question. Have you tried tweaking the `maxgap` in the `findOverlaps()` call? – Eric Fail Dec 29 '15 at 22:55
  • @EricFail when i played with maxgap it ended up erroring out on the ranges call. – user2909302 Dec 30 '15 at 14:57

2 Answers2

3
library(IRanges)

f <- function(a,b)
{
  s <- max(a$start,b$start)  
  e <- min(a$end,b$end)

  if ( s <= e )
  {
    ovlp <- c( start = s,
               end   = e,
               width = e-s+1 )
  } else
  {
    ovlp <- c( start = e-s,
               end   = 0,
               width = NA )
  }

  return(ovlp)
}

findOvlp <- function( X, Y )
{
  if ( length(X) != length(Y) ){ stop("length(X) != length(Y)") }

  n <- length(X)

  X.df <- as.data.frame(X)
  Y.df <- as.data.frame(Y)

  Z <- data.frame( start = rep(NA,length(X)),
                   end   = rep(NA,length(X)),
                   width = rep(NA,length(X)) )

  for ( i in 1:n ) { Z[i,] <- f(X.df[i,],Y.df[i,]) }

  return( Z )
}

.

> xx<-IRanges(start=c(2,9,19,31,45), end=c(3,11,23,35,49))

> yy<-IRanges(start=c(4,10,19,33,45), end=c(5,13,25,38,48))

> findOvlp(xx,yy)
  start end width
1    -1   0    NA
2    10  11     2
3    19  23     5
4    33  35     3
5    45  48     4
mra68
  • 2,960
  • 1
  • 10
  • 17
2

I think what you need is the pintersect function in IRanges.

library(IRanges)
xx <- IRanges(start=c(2,9,19,31,45), end=c(3,11,23,35,49))
yy <- IRanges(start=c(4,10,19,33,45), end=c(5,13,25,38,48))

pintersect(xx, yy)
# IRanges of length 5
#     start end width
# [1]     4   3     0
# [2]    10  11     2
# [3]    19  23     5
# [4]    33  35     3
# [5]    45  48     4

width = 0 indicates no overlaps.

Ven Yao
  • 3,680
  • 2
  • 27
  • 42
  • Thank you - I didn't know about that function. The only thing is that I need to also know the gap between the non-overlapping ranges. – user2909302 Dec 30 '15 at 02:15