Using the foverlaps
function from the data.table
package I get overlapping regions (it shows only 25 lines but it's more than 50 thousand) and I would like to group the overlapping regions for each id taking into account the following criteria:
If they have the same ID and overlapping regions belonging to the same or different group, then:
- group them all, 2) extend the range (i.e. start = min(overlapping item set) and end=max(overlapping item set)), and 3) place the name of the group of the maximum score. For example, given the data set:
dt <- data.table::data.table(
ID=c("1015_4_1_1","1015_4_1_1","1015_4_1_1","103335_0_1_2","103335_0_1_2",
"103335_0_1_2","11099_0_1_1","11099_0_1_1","11099_0_1_1","11099_0_1_1","11099_0_1_1",
"11702_0_1_1","11702_0_1_1","11702_0_1_1","11702_0_1_5","11702_0_1_5","11702_0_1_5",
"140331_0_1_1","140331_0_1_1","140331_0_1_1","14115_0_1_7","14115_0_1_7",
"14115_0_1_7","14115_0_1_8","14115_0_1_8"),
start=c(193,219,269,149,149,163,51,85,314,331,410,6193,6269,6278,6161,6238,6246,303,304,316,1525,1526,1546,1542,1543),
end=c(307,273,399,222,235,230,158,128,401,428,507,6355,6337,6356,6323,6305,6324,432,396,406,1603,1688,1612,1620,1705),
group=c("R7","R5","R5","R4","R5","R6","R7","R5","R4","R5","R5","R5","R6","R4","R5","R6","R4","R5","R4","R6","R4","R5","R6","R4","R5"),
score=c(394,291,409,296,319,271,318,252,292,329,252,524,326,360,464,340,335,515,506,386,332,501,307,308,443)
)
The expected result is:
# 1015_4_1_1 193 399 R5 409
# 103335_0_1_2 149 235 R5 319
# 11099_0_1_1 51 158 R7 318
# 11099_0_1_1 314 507 R5 329
# 11702_0_1_1 6193 6356 R5 524
# 11702_0_1_5 6161 6324 R5 464
# 140331_0_1_1 303 432 R5 515
# 14115_0_1_7 1525 1705 R5 501
note that for each ID there may be subgroups of regions that do not overlap each other, for example in "11099_0_1_1" rows 7 and 8 are grouped in one subgroup and the rest in another subgroup.
I have no experience with GenomicRanges
or IRanges
, and read in another comment that data.table
is usually faster. So, since I was expecting a lot of overlapping regions, I started with foverlaps from data.table
, but I don't know how to proceed. I hope you can help me, and thank you very much in advance