I have the following dataset, showing the INGREDIENTS contained in each PRODUCT;
data <- data.frame("PRODUCT" = c("Creme","Creme","Creme","Creme","Medoc","Medoc","Medoc","Medoc","Medoc","Hububu","Hububu","Hububu","Hububu","Troll","Troll","Troll","Troll","Suzuki","Suzuki","Gluglu","Gluglu","Gluglu"),
"INGREDIENT" = c("zeze","zaza","zozo","zuzu","zaza","sasa","haha","zuzu","zemzem","zaza","zuzu","zizi","haha","zozo","zaza","zemzem","zuzu","sasa","zuzu","ozam","zaza","hayda"))
I want to know the most common combinations of INGREDIENTS in each PRODUCT; which ingredient is associated with which other ingredient ? I applied the code I found in this thread here :
combinaisons_par_PRODUCT = data %>%
full_join(data, by="PRODUCT") %>%
group_by(INGREDIENT.x, INGREDIENT.y) %>%
summarise(n = length(unique(PRODUCT))) %>%
filter(INGREDIENT.x!=INGREDIENT.y) %>%
mutate(item = paste(INGREDIENT.x, INGREDIENT.y, sep=", "))
It works but there is one final flaw; I would like the order to be ignored. For instance, this code, would give me 1 association of HAHA and SASA, and also 1 association of SASA and HAHA. But for me, these are the same things. So I would like the code to ignore the order of INGREDIENTS and give me one unique association of 2 HAHA & SASA.
I tried sorting the INGREDIENTS before applying the code, but it didn't work either. Could someone help me please? How can I have these combinations unregarding the order ?
Thank you very much!