I have two big data with start and end positions and I want to find the common between them according to these positions. But I do not want just find rows that are exactly similar in positions. for example, in the first file, I have a big range from 1 to 300000 and in the second one, I have a range from 100000 to 210000, so the ranges in the second file are inside of the ranges in the first file and I want to consider this as a common or when just one side of the range is covered by another file. here I mentioned an example data:
df1 <- setDT(data.frame(name = c("chr1", "chr2", "chr3", "chr4"), START = c(1, 300000, 470000, 800000), END = c(200000, 370000, 500000, 990000), STRAND = c("+", "+", "+", "-")))
df2 <- setDT(data.frame(name = c("chr1", "chr2", "chr3", "chr4"), START = c(55000, 365000, 372000, 750000), END = c(187000, 371000, 469000, 835000), STRAND = c("+", "+", "+", "-")))
therefore: chr1 in df2 is covered by chr1 in df1 so it is common, chr2 in df2 is common as well because start position is covered by df1, chr3 is not common because it is not covered and chr4 is common because end position in df2 is covered by start position in df1.
- I want to find common between two big files like what I mentioned not just by exact similarity.
Could someone who knows help me how can I do this?