1

I have a subset from a database in csv which has several different columns and I would like to convert the data into transactions. I've already read this post

library(arules)
library(arulesViz)

trans = read.transactions("data.csv", format = "single", sep = ",",
                     cols = c("EMAIL", "BRAND"))

However wasn't able to convert my data with the proposed solution:

CATEGORY   BRAND   SKU   EMAIL         SEGMENT   SALES
shorts     gap     1564  one@mail.x    1         1
tops       gap     8974  one@mail.x    1         2
shoes      nike    3245  two@mail.x    4         3
jeans      levis   8956  two@mail.x    4         1

Now I want to use arules to understand what brands customers generally buy together. In order to use arules I need to convert my data so it looks as follows:

gap, gap
nike, levis

Can anybody help me figure out how to convert my data accordingly?

Community
  • 1
  • 1
Davis
  • 466
  • 4
  • 20

1 Answers1

1

If we consider the column EMAIL as a sort of transaction ID, we can transform your data.frame to class transactions by:

library(arules)
trans <- as(split(df[,"BRAND"], df[,"EMAIL"]), "transactions")

# To explore the rules we could do
rules <- apriori(trans)
inspect(rules)
#  lhs        rhs     support confidence lift
#1 {levis} => {nike}  0.5     1          2   
#2 {nike}  => {levis} 0.5     1          2   
mtoto
  • 23,919
  • 4
  • 58
  • 71
  • thanks, this transofmred the data in the right format. However I recieved the following bessage `Warning message: In asMethod(object) : removing duplicated items in transactions` and when I try to `inspect` rules `[1:5]` I recieve the following error `Error in slot(x, s)[i] : subscript out of bounds` do you know what is causing this? – Davis Aug 25 '16 at 11:07
  • you can't have the same items predicting each other as in gap => gap, hence the removal of duplicates. As for the second one, you'll need to change the values `support = ` and `confidence = ` arguments in the `apriori()` call to get more rules. – mtoto Aug 25 '16 at 12:15
  • thanks you this makes much more sense. I fixed the `confidence = ` and `support = ` parameters. How exactly do I remove duplicates from the transaction dataset? Can I simply remove duplicates before I transform the data into transaction with `df <- unique(df[ , c(2,4) ] )` ? – Davis Aug 26 '16 at 08:46
  • You don't need to remove them, `apriori()` removes them for you automatically. – mtoto Aug 26 '16 at 08:49
  • 1
    I ran everything again and it works like a charm. Thank you so much for your help! – Davis Aug 26 '16 at 10:42