Group individuals with the identical rows

Question

I am working with data containing 10.000 individuals. The data has 8 binary (0, 1) variables. Each variable is an indicator if a survey module exist == 1 or not == 0. Overall, 2^8 = 256 possible combinations of 0 and 1 for each variable and each individual are possible.

Aim: I want to group individuals with the identical rows (that means individauls that took part at the same modules).

My data looks like the following example with onlye three variables:

# example
dat <- data.frame(id = 1:8,          # unique ID
                  v1 = rep(0:1, 4),
                  v2 = rep(1:0, 4),
                  v3 = rep(1:1, 4))

# I can find the unique rows
unique(dat[ , -1])

# I also can count the number of occurence of the unique rows (as suggested by http://stackoverflow.com/questions/12495345/find-indices-of-duplicated-rows)
library(plyr)
ddply(dat[ , -1], .(v1, v2, v3), nrow)

# But I need the information of the occurence on the individual level like this:
dat$v4 <- rep(c("group1", "group2"), 4)

# The number of rows alone is not sufficient because, different combinations can be the same counting

Can't you just use `with(dat, v1 + 2 * v2 + 4 * v3)` as grouping variable? — Ernest A, Aug 14 '16 at 19:09
Thanks @ user20650!!! That helps and is a very easy solution! — maller, Aug 15 '16 at 07:08

score 0 · Answer 1 · answered Feb 14 '18 at 16:27

I'd recommend .GRP from "data.table" for this:

library(data.table)
> as.data.table(dat)[, v4 := sprintf("group_%s", .GRP), .(v1, v2, v3)][]
   id v1 v2 v3      v4
1:  1  0  1  1 group_1
2:  2  1  0  1 group_2
3:  3  0  1  1 group_1
4:  4  1  0  1 group_2
5:  5  0  1  1 group_1
6:  6  1  0  1 group_2
7:  7  0  1  1 group_1
8:  8  1  0  1 group_2

Group individuals with the identical rows

1 Answers1