I'm having troubles with the arulesSequences
library in R
I have a transactional dataset with temporal information (here, let's use the default zaki
dataset). I use SPADE (cspade
function) to find the frequent subsequences in the dataset.
library(arulesSequences)
data(zaki)
frequent_sequences <- cspade(zaki, parameter=list(support=0.5))
Now, what I want is to find, for each sequence (i.e. for each custumer) which are the frequent subsequences that it supports. I tried various combinations of %in%
and subset
without much success.
For example for the second custumer, the initial transactions inspect(zaki[zaki@itemsetInfo$sequenceID==2])
are:
items sequenceID eventID SIZE
5 {A,B,F} 2 15 3
6 {E} 2 20 1
The frequent sequences in the whole dataset inspect(frequent_sequences)
are:
items support
1 <{A}> 1.00
2 <{B}> 1.00
3 <{D}> 0.50
4 <{F}> 1.00
5 <{A, F}> 0.75
6 <{B, F}> 1.00
7 <{D}, {F}> 0.50
8 <{D}, {B, F}> 0.50
9 <{A, B, F}> 0.75
10 <{A, B}> 0.75
11 <{D}, {B}> 0.50
12 <{B}, {A}> 0.50
13 <{D}, {A}> 0.50
14 <{F}, {A}> 0.50
15 <{D}, {F}, {A}> 0.50
16 <{B, F}, {A}> 0.50
17 <{D}, {B, F}, {A}> 0.50
18 <{D}, {B}, {A}> 0.50
What I'd like to see is that customer 2 supports the frequent sequences 1, 2, 4, 5, 6, 9 and 10, but does not support the others.
I could also settle for the reverse information: which are the base sequences that support a given frequent subsequence? R somehow knows this information, since it uses it to compute the support of the frequent sequences.
It seems to me that this should be easy (and it probably is!) but I can't seem to figure it out...
Any idea ?