-1

I am trying to mine frequent itemsets and association rules from data which is in a .CSV file. Learnt about the arules package in R and decided to use it.

Facing problem with the creation of dataframe from the CSV.

My CSV file essentially has the data in the following format:

transactionid,items
1,"milk,beer,diapers"
2,"coke,milk,eggs"
3,"diapers,eggs,coke"

Could anyone help me with the creation of dataframe to pass it to the apriori() or elact() functions of the arules library?

Thanks!

  • I'm guessing he also wants to split the items. So adapting from [here](http://stackoverflow.com/questions/7069076/split-column-at-delimiter-in-data-frame): `df <- read.csv("test.csv", stringsAsFactors = FALSE)` and then `cbind(df[,1, F], with(df, data.frame(do.call(rbind, strsplit(items, ',', fixed=TRUE)))))` If the number of items isn't constant then it's probably better to use `separate` from `tidyr` or something similar. – Molx Sep 27 '15 at 01:28

1 Answers1

1

It sounds like you want to import data from a csv file into a transactions object.

df <- read.csv(text='transactionid,items
               1,"milk,beer,diapers"
               2,"coke,milk,eggs"
               3,"diapers,eggs,coke"',
               stringsAsFactors=FALSE)

library(arules)
lst        <- lapply(df$items,function(x)strsplit(x,split=",")[[1]])
names(lst) <- df$transactionid
trans      <- as(lst,"transactions")
inspect(trans)
#   items     transactionID
# 1 {beer,                 
#    diapers,              
#    milk}                1
# 2 {coke,                 
#    eggs,                 
#    milk}                2
# 3 {coke,                 
#    diapers,              
#    eggs}                3

You should also take a look at the read.transactions(...) function.

jlhoward
  • 58,004
  • 7
  • 97
  • 140