I'm trying to merge intersecting ranges of values within each of my groups (n = 147). For example:
my.df <- data.frame(chrom=c('0F','0F','4F','4F','4F','4F'), start=as.numeric(c(1405,1700,1420,2500,19116,20070)), stop=as.numeric(c(1700,2038,2527,3401,20070,20730)), strand = c('-','-','-','+','+','+'))
my.df
chrom start stop strand
1 0F 1405 1700 -
2 0F 1700 2038 -
3 4F 1420 2527 -
4 4F 2500 3401 +
5 4F 19116 20070 +
6 4F 20070 20730 +
And I am trying to find all of the overlapping ranges for each group while also preserving the 'chrm' column and taking into account the strand column and only merging ranges if they have the same 'strandedness':
chrom start stop strand
1 0F 1405 2038 -
2 4F 1420 2527 -
3 4F 2500 3401 +
4 4F 19116 20730 +
I've found a few methods for determining the presence of overlaps within each group (e.g., plyranges::count_overlaps), but no way to collapse those intersecting ranges together.
I've tried the method below from a previous question, but it ignores the groupings I require and the ranges for all of my groupings end up overlapping to give a single, continuous range regardless of if all ranges overlap. I've also tried the answers from this question, but none of them worked out.
my.df %>%
arrange(start) %>%
group_by(g = cumsum(cummax(lag(stop, default = first(stop))) < start)) %>%
summarise(start = first(start), stop = max(stop))
start end
1 1405 20730