-1

I have been trying to do sequential analysis of products purchased after a certain period of time, like what products combination are being purchased after 7 days by customers and what proportion of customers are purchasing such combination, i have tried arulesSequence package but my data is structured in a way which doesn't go with the package, i have userid, date of purchase, product id and product name in columns, i have searched a lot but haven't got any clear way to do.

Dayy        UID         leaf_category_name  leaf_category_id
5/1/2018    47      Cubes               38860
5/1/2018    272     Pastas & Noodles    34616
5/1/2018    1827    Flavours & Spices   34619
5/1/2018    3505    Feature Phones      1506

this is the kind of data i have, UID stands for user id, leaf category is product purchased in simple terms. I have huge dataset with 2,049,278 rows.

codes i have tried-

library(Matrix)
library(arules)
library(arulesSequences)

library(arulesViz)

#splitting data into transactions
transactions <- as(split(data$leaf_category_id, data$UID), "transactions")

frequent_sequences <- cspade(transactions, parameter=list(support=0.5))

and

# Convert tabular data to sequences. Item is in
# column 1, sequence ID is column 2, and event ID is column 3.
seqs = make_sequences(data, item_col = 1, sid_col = 2, eid_col = 3)             

# generate frequent sequential patterns with minimum
# support of 0.1 and maximum of 6 elements
fseq = spade(seqs, 0.1, 6)

I want to look at sequence of products being purchased after certain days. Can someone help me with this?

Thank You

  • Please share a sample of your data and the code you've tried. See more here [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Tung Jun 27 '18 at 06:21
  • Question is insufficiently focused. No data. No code. No algorithm. https://stackoverflow.com/help/mcve – IRTFM Jun 27 '18 at 06:22
  • @Tung thank you for your suggestion, i have edited the required things. – mragakshi agarwal Jun 27 '18 at 07:00

1 Answers1

0

The apriori path is quite nice, however, not having your data, we can use a famous dataset as example, like Groceries (in your case, you can subset your data after the data you want):

library(arules)
data(Groceries)

# here you can see the product with the biggest support
frequentproducts <- eclat (Groceries, parameter = list(supp = 0.07, maxlen = 15)) 
inspect(frequentItems)
     items                         support    count
[1]  {other vegetables,whole milk} 0.07483477  736 
[2]  {whole milk}                  0.25551601 2513 
[3]  {other vegetables}            0.19349263 1903 
[4]  {rolls/buns}                  0.18393493 1809 
[5]  {yogurt}                      0.13950178 1372 
[6]  {soda}                        0.17437722 1715 
[7]  {root vegetables}             0.10899847 1072 
[8]  {tropical fruit}              0.10493137 1032 
[9]  {bottled water}               0.11052364 1087 
[10] {sausage}                     0.09395018  924 
[11] {shopping bags}               0.09852567  969 
[12] {citrus fruit}                0.08276563  814 
[13] {pastry}                      0.08896797  875 
[14] {pip fruit}                   0.07564820  744 
[15] {whipped/sour cream}          0.07168277  705 
[16] {fruit/vegetable juice}       0.07229283  711 
[17] {newspapers}                  0.07981698  785 
[18] {bottled beer}                0.08052872  792 
[19] {canned beer}                 0.07768175  764 

If you prefere, you can plot it:

itemFrequencyPlot(Groceries, topN=5, type="absolute")

Then you can see the association rules:

association <- apriori (Groceries, parameter = list(supp = 0.001, conf = 0.5)) 
inspect(head(association_conf))


  lhs                                           rhs                support     confidence lift     count
[1] {rice,sugar}                               => {whole milk}       0.001220132 1          3.913649 12   
[2] {canned fish,hygiene articles}             => {whole milk}       0.001118454 1          3.913649 11   
[3] {root vegetables,butter,rice}              => {whole milk}       0.001016777 1          3.913649 10   
[4] {root vegetables,whipped/sour cream,flour} => {whole milk}       0.001728521 1          3.913649 17   
[5] {butter,soft cheese,domestic eggs}         => {whole milk}       0.001016777 1          3.913649 10   
[6] {citrus fruit,root vegetables,soft cheese} => {other vegetables} 0.001016777 1          5.168156 10   

You can see in the last column the count, how many times appears the each rules: this could be read as "how many rows", and, if each rows is a customer, the number of customers. However you have to think about what do you mean with how many customer, if you want for example this a,b,a,c >>> count = 4 or a,b,a,c >>> count 3 (pseudocode). In this case, you have to evaluate your data.
edit
you can lastly have a look at this, as you've stated, there is also the cspade algorithm that can help.

s__
  • 9,270
  • 3
  • 27
  • 45
  • thank you for the answer, but i have already done the analysis you have shared, arules gives the product purchased together irrespective of time interval, having done the apriori algorithm, i also wanted to look at the products combination within certain time frame, for eg. if i am purchasing phone today, i might end up purchasing covers a week after, i am looking for such an anlaysis of products that can be marketed after a certain time period of purchase. – mragakshi agarwal Jun 27 '18 at 08:41
  • You could subset your data by certain dates (es from 01.01.2018 to 08.01.2018), create "basket" for each customer in each subset of time, then apply apriori for each subset, or isn't a way you can think correct? – s__ Jun 27 '18 at 08:46
  • i wanted to do my analysis on weekly based data, i have transactions for 4 months, if i have subset for 7 days, it would give me many small subsets and it might be wrong on whole, is there any way i can use packages like arulesSequence which can be applied on whole data at once? – mragakshi agarwal Jun 27 '18 at 09:01
  • Well using the `lubridate()` package you can use the function `week()` to have automatically each week. However I don't understand what do you mean with "on whole", if you need analysis each week, I suppose it could be skipped a whole analysis. Lastly not, I can only reccomend you to apply apriori to each week. – s__ Jun 27 '18 at 09:12
  • what i meant my whole data is, applying your idea of subsetting data, suppose for 1st week i got combination of rice and sugar being purchased with .05 confidence, for 2nd week i got the same combination with .6 confidence and so on, but at the end i have to show combined result of my analysis, then how exactly i will show the results, like will i say that with .05 confidence i can say that they are purchased together or with .6 confidence ? So, i was looking for an algorithm that could give me product combination purchased in all the weeks with combined confidence, support and lift. – mragakshi agarwal Jun 27 '18 at 09:25
  • How do you imagine your output? I can imagine only two way, first weekly rice and sugar confidence, so you can see the evolution of that couple, second, the whole data, to have a total result. Does it make sense to calculate for each week rice and sugar confidence, then find a total result in your opinion? – s__ Jun 27 '18 at 11:41
  • thank you for your responses, you have been a great help, i tried your method, subsetted data for 1 week, ran apriori algorithm on the same but i just got 1 product combination with lift greater than 1. Is there any way to find more such combinations? – mragakshi agarwal Jun 27 '18 at 12:17
  • You are welcome! You can try to lower the support and the confidence to have more combination, but sometimes data tell you something you do not want, it's you that can decide how much ask. However lift is not the only indicator to look at, you've also confidence and support. – s__ Jun 27 '18 at 12:24