0

I have a dataframe(df) with the following structure:

Store Item
S1    I1
S1    I2
S1    I3
S1    I4
S2    I1
S2    I2
S2    I3
S3    I1
S3    I2
S3    I3
S4    I5

I was hoping for an approach to get a group/cluster for stores based on the common elements in them, that is the following:

Store Group Common_element_with_group
S1    1     I1,I2,I3,I4
S2    2     I1,I2,I3
S3    2     I1,I2,I3
S4    3     I5

Does anyone know of an method to achieve this, I don't even an approach for this.

Anubhav Dikshit
  • 1,729
  • 6
  • 25
  • 48

4 Answers4

3

Here is an option with aggregate from base R

transform(aggregate(.~Store, df, toString), Group = cumsum(!duplicated(Item)))
#  Store           Item Group
#1    S1 I1, I2, I3, I4     1
#2    S2     I1, I2, I3     2
#3    S3     I1, I2, I3     2
#4    S4             I5     3

Or this can be done with data.table

library(data.table)
setDT(df)[, .(Item = toString(Item)), Store][, Group := cumsum(!duplicated(Item))][]
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You can do something like this in base R:

df <- stack(lapply(split(df, df$Store), function(x) paste0(x$Item, collapse = ",")));
df$Group <- as.numeric(factor(df$values, levels = unique(df$values)));
df;
#       values ind Group
#1 I1,I2,I3,I4  S1     1
#2    I1,I2,I3  S2     2
#3    I1,I2,I3  S3     2
#4          I5  S4     3

df <- read.table(text =
    "Store Item
S1    I1
S1    I2
S1    I3
S1    I4
S2    I1
S2    I2
S2    I3
S3    I1
S3    I2
S3    I3
S4    I5", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
1

You can try:

library(tidyverse)
d %>% 
  group_by(Store) %>% 
  summarise(Common_element_with_group=paste(Item, collapse=","),
            Group=factor(n())) %>% 
  mutate(Group=factor(Group, levels = levels(Group), labels = 1:nlevels(Group)))
# A tibble: 4 x 3
   Store Common_element_with_group  Group
  <fctr>                     <chr> <fctr>
1     S1               I1,I2,I3,I4      1
2     S2                  I1,I2,I3      2
3     S3                  I1,I2,I3      2
4     S4                        I5      3

The data:

d <- read.table(text="Store Item
S1    I1
                S1    I2
                S1    I3
                S1    I4
                S2    I1
                S2    I2
                S2    I3
                S3    I1
                S3    I2
                S3    I3
                S4    I5", header=T)
Roman
  • 17,008
  • 3
  • 36
  • 49
1

A solution using aggregate.

agg <- aggregate(Item ~ Store, df, paste, collapse = ", ")

Then you can create a column Group with

agg$Group <- seq_len(nrow(agg))

And, finally, change the columns order:

agg <- agg[, c(1, 3, 2)]
agg
#  Store Group           Item
#1    S1     1 I1, I2, I3, I4
#2    S2     2     I1, I2, I3
#3    S3     3     I1, I2, I3
#4    S4     4             I5
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66