I am working with data containing 10.000 individuals. The data has 8 binary (0, 1) variables. Each variable is an indicator if a survey module exist == 1 or not == 0. Overall, 2^8 = 256 possible combinations of 0 and 1 for each variable and each individual are possible.
Aim: I want to group individuals with the identical rows (that means individauls that took part at the same modules).
My data looks like the following example with onlye three variables:
# example
dat <- data.frame(id = 1:8, # unique ID
v1 = rep(0:1, 4),
v2 = rep(1:0, 4),
v3 = rep(1:1, 4))
# I can find the unique rows
unique(dat[ , -1])
# I also can count the number of occurence of the unique rows (as suggested by http://stackoverflow.com/questions/12495345/find-indices-of-duplicated-rows)
library(plyr)
ddply(dat[ , -1], .(v1, v2, v3), nrow)
# But I need the information of the occurence on the individual level like this:
dat$v4 <- rep(c("group1", "group2"), 4)
# The number of rows alone is not sufficient because, different combinations can be the same counting