4

I've looked around and it doesn't seem like there were any questions posted before regarding this. I have two GRanges object with some coordinates, and I would like to subtract the intervals of one from the other. This is different from finding overlaps with findOverlaps() or using intersect().

For instance:

granges.in 

seqnames ranges$start ranges$end
chr01 1100 2000
chr01 2100 3000
chr02 1000 4000
chr03 1500 3500

granges.out

seqnames ranges$start ranges$end
chr01 1000 1200
chr02 2500 3000
chr03 1500 2000
chr03 3000 3500

and I want:

granges.ref

seqnames ranges$start ranges$end
chr01 1200 2000
chr01 2100 3000
chr02 1000 2500
chr02 3000 4000
chr03 2000 3000

The following works, but it's pretty clumsy and I would have to do it chromosome by chromosome as the number of intervals per chromosome does not match between the two objects.

setdiff(ranges(genome.ref[seqnames(granges.in) == "chr01"]), 
        ranges(interval[seqnames(granges.out)== "chr01"]))

Is there a quicker, more effective way to do using the two GRanges object as a whole?

zx8754
  • 52,746
  • 12
  • 114
  • 209
user3245575
  • 83
  • 2
  • 9
  • you should ask https://www.biostars.org/ – Pierre Jul 10 '14 at 13:30
  • 1
    `setdiff` also works on GRanges objects - is that not what you want? – Gavin Kelly Jul 11 '14 at 08:17
  • Hi Gavin thanks for the suggestion. I've tried that but it gives only non overlapping ranges until it reaches the same number of ranges as the first item in setdiff() - it doesn't actually change the start and end positions of each interval – user3245575 Jul 11 '14 at 09:50
  • 2
    Odd, if I do `x <- GRanges(c(1,1,2,3),IRanges(c(1100,2100,1000,1500), c(2000,3000,4000,3500)))` and `y <- GRanges(c(1,2,3,3), IRanges(c(1000,2500,1500,3000), c(1200,3000,2000,3500))))` then `setdiff(x,y)` gives roughly what you requested. Does it not, for you - maybe check you've got latest version of the packages? – Gavin Kelly Jul 11 '14 at 15:18
  • Yes, that only seems to work if the total number of entries is the same in the two Granges objects... – user3245575 Jul 15 '14 at 09:26
  • 4
    Have you tried to use the "ignore.strand=TRUE" argument in setdiff() ? This can result in an a problem like the one described above, since setdiff is strand specific (which is also why it works when you do it on ranges only - since you remove the strand information.) – Kristoffer Vitting-Seerup Dec 15 '14 at 11:57

1 Answers1

1
setdiff(x, y, ignore.strand=TRUE) 

can work. Similarly, plyranges library is quite easy to use GRanges as well.