0

I am struggling to count the number of unique combinations in my data. I would like to first group them by the id and then count, how many times combination of each values occurs. here, it does not matter if the elements are combined in 'd-f or f-d, they still belongs in teh same category, as they have same element:

combinations: 

       n
c-f:   2   # aslo f-c
c-d-f: 1   # also cfd or fdc
d-f:   2   # also f-d or d-f. The dash is only for isualization purposes  

Dummy example:

# my data
dd <- data.frame(id = c(1,1,2,2,2,3,3,4, 4, 5,5),
             cat = c('c','f','c','d','f','c','f', 'd', 'f', 'f', 'd'))



> dd
  id cat
1  1   c
2  1   f
3  2   c
4  2   d
5  2   f
6  3   c
7  3   f
8  4   d
9  4   f
10  5   f
11  5   d

Using paste is a great solution provided by @benson23, but it considers as unique category f-d and d-f. I wish, however, that the order will not matter. Thank you!

maycca
  • 3,848
  • 5
  • 36
  • 67

1 Answers1

4

Create a "combination" column in summarise, we can count this column afterwards.

An easy way to count the category is to order them at the beginning, then in this case they will all be in the same order.

library(dplyr)

dd %>% 
  group_by(id) %>% 
  arrange(id, cat) %>% 
  summarize(combination = paste0(cat, collapse = "-"), .groups = "drop") %>% 
  count(combination)

# A tibble: 3 x 2
  combination     n
  <chr>       <int>
1 c-d-f           1
2 c-f             2
3 d-f             2
benson23
  • 16,369
  • 9
  • 19
  • 38
  • thank you for great anser! I wonder, how to consider the `d-f` and `fd` as a same category? they contain same elements, but their order actually does not matter. Maybe an easy workaround would be just simply order them firt? then the order fill be maintained, and I will end up with the same categories. thank you! – maycca Feb 18 '22 at 12:48
  • 1
    Yes I agree that the simplest way is to order the elements first. I have included it it my answer. – benson23 Feb 18 '22 at 12:59
  • 1
    I think in some point of the solution, it will require some form of sorting when counting the combinations, and it gets very ugly to do it after the `summarise` step. Therefore I will recommend adding an `arrange` function at the very beginning. – benson23 Feb 18 '22 at 14:39