I have two data sets, I would like to find overlap/intersect/ common regions between them and if there is any overlap , then extract each initial table:
Data A:
chr start end
chr1 25 35
chr1 50 70
chr1 60 85
Data B:
chr start end score
chr1 10 15 24
chr1 55 75 14
chr1 76 82 10
out put tables:
out put 1: results of common regions
chr start end
chr1 55 70
chr1 70 75
chr1 76 82
out put 2: extract from data A:
chr start end
chr1 50 70
chr1 60 85
out put 3: extract from data B:
chr start end score
chr1 55 75 14
chr1 76 82 10
I have tried different ways but I do not know which one is the best:
library(GenomicRanges)
enhancer = with(dataA, GRanges(chr, IRanges(start=start, end=end)))
H3K4me1= with(dataB, GRanges(chr, IRanges(start=start, end=end)))
way 1:
hits <- findOverlaps(dataA, dataB)
ranges(dataA)[queryHits(hits)] = ranges(dataB)[subjectHits(hits)]
dataA
dataB
way2:
over<- subsetByOverlaps(dataA, dataB)
way 3:
inter = intersect(dataA, dataB)
way 4:
groupA <- data.table(dataA)
setkey(groupA, chr, start, end)
groupB <- data.table(dataB)
setkey(groupB, chr, start, end)
over <- foverlaps(groupA, groupB, nomatch = 0)
over2 <- data.table(
chr = over$chr,
start = over[, ifelse(start > i.start, start, i.start)],
end = over[, ifelse(end < i.end, end, i.end)])