Count of unique combinations despite order

Question

Thanks in advance. I have a data frame of family members and their relationship to the "head of household", and I'd like to count the number of unique combinations of family structures.

I can achieve this (likely in a roundabout way) by converting the data to a wide format and using ddply count, but this does not account for identical family structures that are in a different order. Like such:

familyMember <- c("son","son","Head of household","daughter","grandmother","Head of household","son",
              "Head of household","son","son","daughter","grandmother","Head of household","son")
familyGroup <- c(1,1,1,2,2,2,2,3,3,3,4,4,4,4)
families <- data.frame(familyMember,familyGroup)

Note that familyGroups '2' and '4' are exactly the same family structure in the same order. Note that familyGroups '1' and '3' are the same family structure but are in a different order. I then use dplyr to create an index that is the count of 'family member' for each 'family group'

familiesIndex <- ddply(families, .(familyGroup), mutate, 
          index = paste0('family', 1:length(familyGroup)))

Next I reshape to wide format:

familiesIndex_reshape <- reshape(familiesIndex, idvar = "familyGroup", timevar="index", direction = "wide")

Finally, I use count to get the number of unique combinations:

familiesIndex_reshape_Unique <- count(familiesIndex_reshape, 
                                 familyMember.family1,
                                 familyMember.family2,
                                 familyMember.family3,
                                 familyMember.family4) %>% ungroup()

This leads to separate groups for familyGroups 1 and 3. I'd like these two groups to be counted as the same despite their order. Thanks so much, again.

You just need to sort each family by something before creating your index/pasting. `families = families[order(families$familyGroup, families$familyMember), ]` Do that before your other commands. — Gregor Thomas, Aug 15 '16 at 16:22
Also, you seem to using a strange mix of `plyr` (`ddply()`) and `dplyr` (`%>% ungroup()`). I'd strongly recommend switching to only `dplyr`. — Gregor Thomas, Aug 15 '16 at 16:23
To that end, your `ddply` could be rewritten as `group_by(families, familyGroup) %>% mutate(index = paste0('family', 1:n()))`. — Gregor Thomas, Aug 15 '16 at 16:25

Count of unique combinations despite order

0 Answers0

Linked