I am trying to isolate the unique groups of items in my data - unique groupings of rows associated with a key column, not unique items, which is what most use the unique function for. The question takes some careful reading...so please be kind enough to digest the example first.
To be clear, I do NOT want the unique subset of the group column, nor do I want unique subsets of items, nor even unique combinations of groups and items. I know these have been covered elsewhere unique() for more than one variable. What I want are unique sets of items, where sets are defined by groups.
Here is an example
set.seed(1234)
library(data.table)
A <- data.table(group = rep(c("A","B","C","D","E","F"),each = 4),
item = c(1, 2, 4, 3, 5, 2, 3, 6, 10, 12, 1, 2, 1, 2, 4, 3, 6, 3,
5, 2, 10, 12, 1, 2), c = runif(8))
A <- A[-23, ] #so we can have an example of unbalanced groups
> A
group item c
1: A 1 0.15904600
2: A 2 0.03999592
3: A 4 0.21879954
4: A 3 0.81059855
5: B 5 0.52569755
6: B 2 0.91465817
7: B 3 0.83134505
8: B 6 0.04577026
9: C 10 0.15904600
10: C 12 0.03999592
11: C 1 0.21879954
12: C 2 0.81059855
13: D 1 0.52569755
14: D 2 0.91465817
15: D 4 0.83134505
16: D 3 0.04577026
17: E 6 0.15904600
18: E 3 0.03999592
19: E 5 0.21879954
20: E 2 0.81059855
21: F 10 0.52569755
22: F 12 0.91465817
23: F 2 0.04577026
#The unique groups are A:F, and the unique items are 1:6,10,12.
#The unique sets of items are: # (set1) 1,2,3,4; (set2) 5,2,3,6;
#(set3) 10,2,1,2; (set4) 10,12,2
I want to retrieve these unique sets of items (note again that the item sets are formed by groups). (The third column means little at this time. For fun, I include sums by each 'item'). The output table should look like this:
group item c
A 1 0.68474355 #note that groups A and D share this same set of items (set1)
A 2 0.95465409
A 4 1.05014459# c sums groupAitem4$c with groupDitem4$c
A 3 0.85636881
B 5 0.74449709 # group E has the same items (set2), even if not the same order, c is totaled by item.
B 2 1.72525672
B 3 0.87134097
B 6 0.20481626
C 10 0.159046
C 12 0.03999592
C 1 0.21879954
C 2 0.81059855
F 10 0.52569755 #Not the same as group C
F 12 0.91465817
F 2 0.04577026
I suppose there might be a way of going through reshape that would be quite awkward. My data is large, so efficient procedures like data.table
would be very appreciated.