2

I have found other posts on generally finding overlapping ranges in R iRanges but could you help me with this extra little twist: i have two ranges that are linked (a possible genomic rearrangement with a start range and an end range) and i would like to filter out the same ranges in the mother genome

I have found ranges for stop and start as below (chr number, start of interval, end of interval) where 3 columns on the left show the start of the rearrangement and the 3 columns on the right show the end of the rearrangement (they are the output of program called SVDetect, which use NGS data to find mate pairs that have abnormal alignment to a reference genome). I have two genomes, the mother clone and the daughter and would like to find rearrangements, which are unique for the daughter = i would like to filter out rows where both ranges overlap with the SAME row of two ranges in another. The ranges might be a bit different but if both ranges overlap it would strongly indicate that the rearrangement was already present in the mother. iRanges in R allow you easily to see whether a range overlap with other ranges but I have not been able to find a solution where it could show me WHICH range it overlapped with without it being a very very slow for-loop.

Daughter:

1  1384138 1384862 - 1  516731  516918
2  3758860 3759278 - 2  879828  879966 # (filter away this line as overlap with below)
2  3940051 3940470 - 2  3940856 3941250

Mother:

2  3758858 3759282 - 2  879828  879966 # (overlap with this range)
1  1384138 1384862 - 3  116231  516918
2  3940051 3940470 - 3  1540856 3941250
Arun
  • 116,683
  • 26
  • 284
  • 387

1 Answers1

0

The trick is to use two sets of GRanges, one for the rearrangement start and one for the rearrangement end, and then combine the results as follows:

### Create GRanges for daughter - copied from example
daughterStart <- GRanges(c(1,2,2), IRanges(c(1384138,3758860,3940051), c(1384862,3759278,3940470)))
daughterEnd <- GRanges(c(1,2,2), IRanges(c(516731,879828,3940856), c(516918,879966,3941250)))

### Create GRanges for mother - copied from example
motherStart <- GRanges(c(2,1,2), IRanges(c(3758858,1384138,3940051), c(3759282,1384862,3940470)))
motherEnd <- GRanges(c(2,3,3), IRanges(c(879828,116231,1540856), c(879966,516918,3941250)))

Then we identify whether there are any overlaps using the findOverlaps() function using the daughter as query since we are asking whether the daughters rearrangement overlaps the mothers rearrangement ( suppressWarnings() is used because the GRanges have different seqlevels (chromosomes) and thereby gives a warning) :

starOverlap <- suppressWarnings( findOverlaps(query = daughterStart, subject = motherStart) ) # suppressWarnings to ignore wanings about different chromosomes
endOverlap  <- suppressWarnings( findOverlaps(query = daughterEnd,   subject = motherEnd  ) )

And lastly we identify whether there are any identical overlaps in the rearrangement-start and rearrangement-end overlaps:

> starOverlap %in% endOverlap
[1] FALSE  TRUE FALSE

Which can be used to get the indexes of the daughters pairs that is NOT overlapping simply by adding a !

> starOverlap@queryHits[ ! (starOverlap %in% endOverlap) ]
[1] 1 3

And since this approach relies on findOverlaps and is vectorized it will be fast up to millions of rearrangement