In a comment, the OP pointed out:
for my purposes I don't care about the arrangement of elements so long
as all combinations of Col1 and (Col3&Col4) per group ID exist in the
output
So, if I understand correctly, the question is not about collapsing but about creating all possible combinations of non-NA values of columns Col1
, Col2
, and combined columns (Col3
, Col4
) for each ID
group.
For this, expand()
and nesting()
from the tidyr package can be used to create the combinations. na.omit()
removes all rows containing any NA
afterwards:
library(dplyr)
library(tidyr)
DF %>%
group_by(ID) %>%
expand(Col1, Col2, nesting(Col3, Col4)) %>%
na.omit() %>%
ungroup()
ID Col1 Col2 Col3 Col4
<int> <int> <int> <int> <int>
1 1 5 10 15 20
2 1 6 10 15 20
3 2 13 25 21 34
4 2 13 25 35 40
5 2 17 25 21 34
6 2 17 25 35 40
This approach should work well also with non-numerical data.
Edit 1
Giving a second thought, I was wondering about the peculiar structure of the input dataset, namely the positions of the NA
s:
DF
ID Col1 Col2 Col3 Col4
1 1 6 10 NA NA
2 1 5 10 NA NA
3 1 NA 10 15 20
4 2 17 25 NA NA
5 2 13 25 NA NA
6 2 NA 25 21 34
7 2 NA 25 35 40
To me, it appears as if DF
was constructed from three separate subsets, the first one for Col1
ID Col1
1 1 6
2 1 5
4 2 17
5 2 13
the second one for Col2
ID Col2
1 1 10
4 2 25
the third one for Col3
and Col4
ID Col3 Col4
3 1 15 20
6 2 21 34
7 2 35 40
Based on this observation, here is a different approach which creates all possible combinations of the subsets by a series of merge operations (cartesian joins) of the subsets:
library(magrittr) # piping used her to improve readability
list("Col1", "Col2", c("Col3", "Col4")) %>%
lapply(function(x) DF[c("ID", x)] %>% na.omit %>% unique) %>%
Reduce(merge, .)
ID Col1 Col2 Col3 Col4
1 1 6 10 15 20
2 1 5 10 15 20
3 2 17 25 21 34
4 2 17 25 35 40
5 2 13 25 21 34
6 2 13 25 35 40
Here, lapply()
creates a list of subsets of the input dataset which is then merged repeatedly using Reduce()
.
Edit 2:
With version 4.1.0, R has gained a simple native forward pipe syntax |>
and \()
as a shorthand notation for function()
. With this, the code of Edit 1 can be re-written to use only base R (without magrittr):
list("Col1", "Col2", c("Col3", "Col4")) |>
lapply(\(x) DF[c("ID", x)] |> na.omit() |> unique()) |>
(\(z) Reduce(merge, z))()