Optimization of Market Basket Analysis in R code

Question

I am trying out something along the line of market basket analysis but with certain twist. Supposing I am interested to conduct Market Basket Analysis on customers of different predefined segments and obtain associated rules for every 'item in market basket'. Here's the working code:

   l <- NULL
   rf <- NULL
   rf_temp <- NULL
   options(scipen=999)
   options(digits = 10)
   for (i in ori_distinct_char) #different customer segments
   {
       subset <-ori[which(ori$V3==paste(i, sep = "")),] 
       # subseting different segments
       subset_data <- as(split(as.vector(subset[,2]),as.vector(subset[,1])), "transactions")
       food<- unique(subset$V2) 
      for (j in route[1:length(food)])
      {
       rules_food <- apriori(subset_data, parameter = list(supp = 0.0000001,conf = 0.0000001, minlen = 2, target = "rules"), 
       appearance = list(lhs = paste(j, sep = "") ,default='rhs')) 
       # made minimum support and confidence as low as possible to allow more rules to be defined (due to lack of data)
       rules_food <- sort(rules_food, by=c("confidence"), decreasing=TRUE)
       rf_temp <- as(head(rules_food,50), "data.frame")
       if (nrow(rf_temp)!=0)
       {rf <- rbind(rf,cbind(rf_temp,paste(i, sep = "")))}
      } 
    }

I am trying to find a way to run this script so that every permutation could be run in parallel manner : i.e: association rules to be defined on different customer segments and food in a parallel manner to cover all the possible permutations. Else the working script here is too slow, imagine 5 segments & 2000 choices of food.

Update with my attempt using 'foreach' loop thus far:

   cl<- makeCluster(3)
   registerDoParallel(cl)

   l <- NULL
   rf <- NULL
   rf_temp <- NULL
   options(scipen=999)
   options(digits = 10)
   foreach (i = 1:length(ori_distinct_char)) %dopar% #different customer segments
   {
       subset <-ori[which(ori$V3==paste(i, sep = "")),] 
       # subseting different segments
       subset_data <- as(split(as.vector(subset[,2]),as.vector(subset[,1])), "transactions")
       food<- unique(subset$V2) 
       foreach (o = 1:length(food),.combine=rbind,.packages = 'arules') %dopar%
      {
       rules_food <- apriori(subset_data, parameter = list(supp = 0.0000001,conf = 0.0000001, minlen = 2, target = "rules"), 
       appearance = list(lhs = paste(j, sep = "") ,default='rhs')) 
       # made minimum support and confidence as low as possible to allow more rules to be defined (due to lack of data)
       rules_food <- sort(rules_food, by=c("confidence"), decreasing=TRUE)
       rf_temp <- as(head(rules_food,50), "data.frame")
       if (nrow(rf_temp)!=0)
       {rf <- rbind(rf,cbind(rf_temp,paste(i, sep = "")))}
      } 
    }

I would suggest the `foreach` package, but cannot answer until you provide some data showing a minimal example. — aichao, Aug 02 '16 at 20:41
Can you `dput` this data and the expected output? Please see [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) as a guideline. — aichao, Aug 03 '16 at 02:51
Hi, exactly what I thought. So the example of data (first 4 lines of ori table) is as such: customer|food choice|pre-defined segment v1|v2|v3 A|milk|1 A|apple|1 B|beer|2 B|cider|2 — skw1990, Aug 03 '16 at 02:54
The output from source data set: dput(head(ori,1)) structure(list(V1 = 9986736710, V2 = structure(2208L, .Label = c("BEER", "APPLE","EGG","ORANGE"), class = "factor"), V3 = structure(5L, .Label = c("1", "2", "3", "4", "5"), class = "factor")), .Names = c("V1", "V2", "V3"), row.names = 1L, class = "data.frame") — skw1990, Aug 03 '16 at 03:07
aichao, I have updated my question with my most recent attempt using 'foreach' but I do not think it's working at all (it's not running at all). Hence, I need assistance for that. Thanks in advanced. — skw1990, Aug 03 '16 at 03:17
ori_distinct_char = unique(ori$V3); food = unique(ori$V2); where ori_distinct_char is vector of pre-defined segment; food is vector of item. If more rows are required, I think it will exceed the maximum number of characters allowed here. The basic idea behind my code is this: For every defined customer segment, I do the MBA, obtain the association rules for different food, which is why you observed 2 for loops here. To call the out segments, I need to call out every element in ori_distinct_char; so does food. — skw1990, Aug 03 '16 at 06:04
dput(head(ori_2,50)) structure(list(V1 = 1:50, v2 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 3L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 2L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 3L, 38L, 39L, 40L, 41L, 42L, 26L, 25L, 43L, 44L, 45L), v3 = c(1L, 2L, 1L, 2L, 2L, 3L, 4L, 5L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 2L, 2L, 4L, 1L, 1L, 4L, 2L, 1L, 2L, 4L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 1L, 4L, 4L, 2L, 4L, 1L, 1L, 4L, 3L, 4L, 2L, 4L, 2L, 2L, 3L, 2L)), .Names = c("V1", "v2", "v3"), row.names = c(NA, 50L), class = "data.frame") — skw1990, Aug 03 '16 at 06:04
Above is dput from 50 rows, where i masked the data for P&C reason. — skw1990, Aug 03 '16 at 06:05

Optimization of Market Basket Analysis in R code

0 Answers0