0

Thanks in advance. I have a data frame of family members and their relationship to the "head of household", and I'd like to count the number of unique combinations of family structures.

I can achieve this (likely in a roundabout way) by converting the data to a wide format and using ddply count, but this does not account for identical family structures that are in a different order. Like such:

familyMember <- c("son","son","Head of household","daughter","grandmother","Head of household","son",
              "Head of household","son","son","daughter","grandmother","Head of household","son")
familyGroup <- c(1,1,1,2,2,2,2,3,3,3,4,4,4,4)
families <- data.frame(familyMember,familyGroup)

Note that familyGroups '2' and '4' are exactly the same family structure in the same order. Note that familyGroups '1' and '3' are the same family structure but are in a different order. I then use dplyr to create an index that is the count of 'family member' for each 'family group'

familiesIndex <- ddply(families, .(familyGroup), mutate, 
          index = paste0('family', 1:length(familyGroup)))     

Next I reshape to wide format:

familiesIndex_reshape <- reshape(familiesIndex, idvar = "familyGroup", timevar="index", direction = "wide")

Finally, I use count to get the number of unique combinations:

familiesIndex_reshape_Unique <- count(familiesIndex_reshape, 
                                 familyMember.family1,
                                 familyMember.family2,
                                 familyMember.family3,
                                 familyMember.family4) %>% ungroup()

This leads to separate groups for familyGroups 1 and 3. I'd like these two groups to be counted as the same despite their order. Thanks so much, again.

Reggie Milton
  • 45
  • 1
  • 3
  • 1
    You just need to sort each family by something before creating your index/pasting. `families = families[order(families$familyGroup, families$familyMember), ]` Do that before your other commands. – Gregor Thomas Aug 15 '16 at 16:22
  • 1
    Also, you seem to using a strange mix of `plyr` (`ddply()`) and `dplyr` (`%>% ungroup()`). I'd strongly recommend switching to only `dplyr`. – Gregor Thomas Aug 15 '16 at 16:23
  • 1
    To that end, your `ddply` could be rewritten as `group_by(families, familyGroup) %>% mutate(index = paste0('family', 1:n()))`. – Gregor Thomas Aug 15 '16 at 16:25
  • Thanks @Gregor, I appreciate the time. – Reggie Milton Aug 15 '16 at 20:33

0 Answers0