How to remove the sub-sequences from cspade algorithm in arulesSequence package in R, For example if my data(Sample.txt) is as below
Column Names: sequenceID, EventID, size, Item
1 1 1 A
1 2 1 B
1 3 1 C
1 4 1 D
2 1 1 A
2 2 1 B
2 3 1 C
3 1 1 A
3 2 1 B
3 3 1 C
3 4 1 D
After running the below arulesSequence line of codes
library("arulesSequences")
#### while importing the Sample.txt remove the column names #####
SymptomArulesSeq <- read_baskets("Sample.txt",sep = "[ \t]+",info = c("sequenceID","eventID","size"))
s1 <- cspade(SymptomArulesSeq, parameter = list(support = 0.1), control = list(verbose = TRUE),tmpdir = tempdir())
summary(s1)
as(s1, "data.frame")
sequence support
<{A}> 1
<{B}> 1
<{C}> 1
<{D}> 0.6666667
<{A},{D}> 0.6666667
<{B},{D}> 0.6666667
<{C},{D}> 0.6666667
<{B},{C},{D}> 0.6666667
<{A},{C},{D}> 0.6666667
<{A},{B},{C},{D}> 0.6666667
<{A},{B},{D}> 0.6666667
<{A},{C}> 1
<{B},{C}> 1
<{A},{B},{C}> 1
<{A},{B}> 1
How to find the full length sequences without loosing the items between?
As from the data, the main full length sequence starting from A is A (1), A->B (1), A->B->C (1) and A->B->C->D (0.67), so How can I remove the intermediate sub-sequences and want the results as mentioned.
Challenge here is how to eliminate the sequences which are formed in between like B, B->C etc and also how to eliminate the sequences like A->B->D (Here I'm loosing the actual sequence; item C is discarded)