Grouping a dataframe based on count of common elements

Question

I have a dataframe(df) with the following structure:

Store Item
S1    I1
S1    I2
S1    I3
S1    I4
S2    I1
S2    I2
S2    I3
S3    I1
S3    I2
S3    I3
S4    I5

I was hoping for an approach to get a group/cluster for stores based on the common elements in them, that is the following:

Store Group Common_element_with_group
S1    1     I1,I2,I3,I4
S2    2     I1,I2,I3
S3    2     I1,I2,I3
S4    3     I5

Does anyone know of an method to achieve this, I don't even an approach for this.

How do you get the values in column `Group` from your sample input data? — Maurits Evers, Mar 05 '18 at 11:46
So what *is* the logic? What's the rule for getting the values in `Group` based on the input data? — Maurits Evers, Mar 05 '18 at 11:50
The logic would be based on the common elements, so if two stores have the same elements then you assign a group to them say '1', if you cant find a store with similar elements then you assign them to their own group — Anubhav Dikshit, Mar 05 '18 at 11:54

score 3 · Accepted Answer · answered Mar 05 '18 at 11:51

3

Here is an option with aggregate from base R

transform(aggregate(.~Store, df, toString), Group = cumsum(!duplicated(Item)))
#  Store           Item Group
#1    S1 I1, I2, I3, I4     1
#2    S2     I1, I2, I3     2
#3    S3     I1, I2, I3     2
#4    S4             I5     3

Or this can be done with data.table

library(data.table)
setDT(df)[, .(Item = toString(Item)), Store][, Group := cumsum(!duplicated(Item))][]

answered Mar 05 '18 at 11:51

akrun

874,273
37
540
662

1

I wonder if `setDT(df)[, .(Item = toString(Item)), Store][, group := .GRP, by = Item]` also works. – jazzurro Mar 05 '18 at 12:04
1

@jazzurro Yes, it is a good one – akrun Mar 05 '18 at 12:05

Maurits Evers · Answer 2 · 2018-03-05T12:05:51.577

You can do something like this in base R:

df <- stack(lapply(split(df, df$Store), function(x) paste0(x$Item, collapse = ",")));
df$Group <- as.numeric(factor(df$values, levels = unique(df$values)));
df;
#       values ind Group
#1 I1,I2,I3,I4  S1     1
#2    I1,I2,I3  S2     2
#3    I1,I2,I3  S3     2
#4          I5  S4     3

df <- read.table(text =
    "Store Item
S1    I1
S1    I2
S1    I3
S1    I4
S2    I1
S2    I2
S2    I3
S3    I1
S3    I2
S3    I3
S4    I5", header = T)

Roman · Answer 3 · 2018-03-05T12:04:26.580

You can try:

library(tidyverse)
d %>% 
  group_by(Store) %>% 
  summarise(Common_element_with_group=paste(Item, collapse=","),
            Group=factor(n())) %>% 
  mutate(Group=factor(Group, levels = levels(Group), labels = 1:nlevels(Group)))
# A tibble: 4 x 3
   Store Common_element_with_group  Group
  <fctr>                     <chr> <fctr>
1     S1               I1,I2,I3,I4      1
2     S2                  I1,I2,I3      2
3     S3                  I1,I2,I3      2
4     S4                        I5      3

The data:

d <- read.table(text="Store Item
S1    I1
                S1    I2
                S1    I3
                S1    I4
                S2    I1
                S2    I2
                S2    I3
                S3    I1
                S3    I2
                S3    I3
                S4    I5", header=T)

score 1 · Answer 4 · answered Mar 05 '18 at 11:52

A solution using aggregate.

agg <- aggregate(Item ~ Store, df, paste, collapse = ", ")

Then you can create a column Group with

agg$Group <- seq_len(nrow(agg))

And, finally, change the columns order:

agg <- agg[, c(1, 3, 2)]
agg
#  Store Group           Item
#1    S1     1 I1, I2, I3, I4
#2    S2     2     I1, I2, I3
#3    S3     3     I1, I2, I3
#4    S4     4             I5

Grouping a dataframe based on count of common elements

4 Answers4