I want to do something similar to the solution in this thread, where by I have two dataframes and I want to find regions that overlap, and then merge the corresponding data to the hits
>x1
chr start stop CN
1 1 10 140 G
2 1 100 1000 G
3 1 1500 5000 L
>x2
chr start stop gene
1 1 1 100 a
2 1 100 150 b
3 1 190 1000 c
4 1 1000 2000 d
5 1 2000 5000 e
I can find the regions that overlap with the following code:
library(GenomicRanges)
gr1 = with(x1, GRanges(chr, IRanges(start=start, end=stop)))
gr2 = with(x2, GRanges(chr, IRanges(start=start, end=stop)))
hits = findOverlaps(gr1, gr2)
with the hits showing the regions in x1 that overlap with x2 e.g:
> hits
Hits of length 8
queryLength: 3
subjectLength: 5
queryHits subjectHits
<integer> <integer>
1 1 1
2 1 2
3 2 1
4 2 2
5 2 3
6 2 4
7 3 4
8 3 5
What I would like to do instead would be have the output include both gene and CN info from x1 and x2. The output would look like this
x1chr x1start x1stop x1CN x2chr x2start x2stop x2gene
1 1 10 140 g 1 1 100 a
2 1 10 140 g 1 100 150 b
3 1 100 1000 g 1 1 100 a
4 1 100 1000 g 1 100 150 b
5 1 100 1000 g 1 190 1000 c
6 1 100 1000 g 1 1000 2000 d
7 1 1500 5000 l 1 1000 2000 d
8 1 1500 5000 l 1 2000 5000 e