I have two large data frames that look like this:
df1 <- tibble(chrom=c(1,1,1,2,2,2),
start=c(100,200,300,100,200,300),
end=c(150,250,350,120,220,320))
df2 <- tibble(chrom=c(1,1,1,2,2,2),
start2=c(100,50,280,100,10,200),
end2=c(125,100,320,115,15,350))
df1
#> # A tibble: 6 × 3
#> chrom start end
#> <dbl> <dbl> <dbl>
#> 1 1 100 150
#> 2 1 200 250
#> 3 1 300 350
#> 4 2 100 120
#> 5 2 200 220
#> 6 2 300 320
df2
#> # A tibble: 6 × 3
#> chrom start2 end2
#> <dbl> <dbl> <dbl>
#> 1 1 100 125
#> 2 1 50 100
#> 3 1 280 320
#> 4 2 100 115
#> 5 2 10 15
#> 6 2 200 350
Created on 2023-01-09 with reprex v2.0.2
I want to find which range[start2-end2] of df2 overlaps with the range[start-end] of df1. An ideal output would be something like this, but it's not necessary. Mostly I want the coordinates of the overlapping ranges.
#> # A tibble: 6 × 8
#> chrom start end start2 end2 overlap overlap_start overlap_end
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 1 100 150 100 125 yes 100 125
#> 2 1 200 250 50 100 no <NA> <NA>
#> 3 1 300 350 280 320 yes 300 320
#> 4 2 100 120 100 115 yes 100 115
#> 5 2 200 220 10 15 no <NA> <NA>
#> 6 2 300 320 200 350 yes 200,220 300,320
Created on 2023-01-09 with reprex v2.0.2
!Note that on the last line, the range 200-350 overlaps already with two ranges from df1[200-220, 300-320].