This is a followup of a question I asked yesterday, now extended to include more than 2 inputs. I was able to find two related answers on SO, but none of them gave quite enough information for me to solve this in a performant way.
I would like to combine a list of IRanges into a single IRanges. Here's an example input:
[[1]]
IRanges object with 2 ranges and 1 metadata column:
start end width | on_betalac
<integer> <integer> <integer> | <logical>
[1] 1 21 21 | FALSE
[2] 22 22 1 | TRUE
[[2]]
IRanges object with 2 ranges and 1 metadata column:
start end width | on_other
<integer> <integer> <integer> | <logical>
[1] 1 21 21 | FALSE
[2] 22 22 1 | TRUE
[[3]]
IRanges object with 1 range and 1 metadata column:
start end width | on_pen
<integer> <integer> <integer> | <logical>
[1] 1 22 22 | FALSE
[[4]]
IRanges object with 3 ranges and 1 metadata column:
start end width | on_quin
<integer> <integer> <integer> | <logical>
[1] 1 3 3 | FALSE
[2] 4 13 10 | TRUE
[3] 14 22 9 | FALSE
For ease of replication, the dput
of this list is at the end of my post.
And my desired output is:
IRanges object with 4 ranges and 4 metadata columns:
start end width | on_betalac on_other on_pen on_quin
<integer> <integer> <integer> | <logical> <logical> <logical> <logical>
[1] 1 3 3 | FALSE FALSE FALSE FALSE
[2] 4 13 10 | FALSE FALSE FALSE TRUE
[3] 14 21 8 | FALSE FALSE FALSE FALSE
[4] 22 22 1 | TRUE TRUE FALSE FALSE
You can see that the output is kind of like the disjoin of the input, but with the mcols propagated through, so that each output row has the mcols of the input row that "gave rise" to it.
Here's my solution, which works, but is quite slow.
combine_exposures <- function(exposures) {
cd <- do.call(what = c, args = exposures)
mc <- mcols(cd)
dj <- disjoin(x = cd, with.revmap = TRUE)
r <- mcols(dj)$revmap
d <- as.data.frame(matrix(nrow = length(dj), ncol = ncol(mc)))
names(d) <- names(mc)
for (i in 1:length(dj)) {
d[i,] <- sapply(X = 1:ncol(mc), FUN = function(j) { mc[r[[i]][j], j] })
}
mcols(dj) <- d
return(dj)
}
And here's the dput of the sample input:
list(new("IRanges", start = c(1L, 22L), width = c(21L, 1L), NAMES = NULL,
elementType = "ANY", elementMetadata = new("DataFrame", rownames = NULL,
nrows = 2L, listData = list(on_betalac = c(FALSE, TRUE
)), elementType = "ANY", elementMetadata = NULL, metadata = list()),
metadata = list()), new("IRanges", start = c(1L, 22L), width = c(21L,
1L), NAMES = NULL, elementType = "ANY", elementMetadata = new("DataFrame",
rownames = NULL, nrows = 2L, listData = list(on_other = c(FALSE,
TRUE)), elementType = "ANY", elementMetadata = NULL, metadata = list()),
metadata = list()), new("IRanges", start = 1L, width = 22L,
NAMES = NULL, elementType = "ANY", elementMetadata = new("DataFrame",
rownames = NULL, nrows = 1L, listData = list(on_pen = FALSE),
elementType = "ANY", elementMetadata = NULL, metadata = list()),
metadata = list()), new("IRanges", start = c(1L, 4L, 14L),
width = c(3L, 10L, 9L), NAMES = NULL, elementType = "ANY",
elementMetadata = new("DataFrame", rownames = NULL, nrows = 3L,
listData = list(on_quin = c(FALSE, TRUE, FALSE)), elementType = "ANY",
elementMetadata = NULL, metadata = list()), metadata = list()))