I have a data frame with 2 columns, ID and a category name:
X1 X2
1234 Metal
1234 Metal
1234 Plastic
1234 Plastic
1234 Glass
1235 Metal
1235 Metal
1235 Plastic
1235 Plastic
1235 Glass
1236 Glass
1236 Glass
1236 Metal
1236 Metal
1236 Plastic
I want to find the most frequent combinations and the count of those combinations of 2 (I will want combinations of 3 or 4 for a larger dataset) across the entire dataset:
Metal, Plastic 2
Glass, Metal 1
I tried to first generate all possible combinations of X2
by ID (X1
), so I could then use dplyr
to aggregate and subset the top combinations. Unfortunately, my dataset is too large for this to run efficiently. Any ideas on an easier and faster way to figure this out?