Questions tagged [pattern-mining]
31 questions
8
votes
3 answers
What is the difference between "Sequential Pattern Mining" and "Sequential Rule Mining"
The documentation for the very powerful open source data mining tool SPMF lists them separately:
http://www.philippe-fournier-viger.com/spmf/index.php?link=algorithms.php
Does any one know why?

R Claven
- 1,160
- 2
- 13
- 27
3
votes
1 answer
Extract the Lift and Support from Association Rules using SPARK
I'm using the Frequent Pattern Mining algorithm - Association Rules:
import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
val freqItemsets = sc.parallelize(Seq(
new FreqItemset(Array("a"),…

João_testeSW
- 99
- 1
- 12
3
votes
1 answer
Apriori, arulesSequences, in R : Does it have support for sequence of "baskets" (order within single shopping trip doesn't matter)?
I'm getting started with arulesSequences with an aim to perform Frequent Sequence Mining on some data I have. The data for a store A looks like below:
CUSTOMER_ID seq_num Size bought_items
1 17399 1 2 {100,100}
2 …

ednaMode
- 443
- 3
- 14
2
votes
2 answers
Generate image matrix from Freeman chain code
Suppose I have a 8-direction freeman chain code as follows, in a python list:
freeman_code = [3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5]
Where directions would be defined as follows:
I need to convert this to an image matrix of variable dimensions…

toing_toing
- 2,334
- 1
- 37
- 79
2
votes
1 answer
TraMineR, Extract all present combination of events as dummy variables
Lets say I have this data. My objective is to extraxt combinations of sequences.
I have one constraint, the time between two events may not be more than 5, lets call this maxGap.
User <- c(rep(1,3)) # One users
Event <- c("C","B","C") # Say…

Developer
- 917
- 2
- 9
- 25
1
vote
0 answers
Association rule mining with R (arules package)
I have a given dataset about the orders of a store.
| Order.ID | Category | Sub.Category | Product.Name |
| --------------- | -------- | ------------ | ------------ |
| 1 | 2 | Furnishings | ProductName1 |
| 2 …

justAUser
- 11
- 4
1
vote
0 answers
How to interpret Apyori (Apriori algorithm) output for association rules longer than 2?
I have implemented the Apriori algorithm to find frequent itemsets and association rules on my dataset and the Apyori library in Python gives me these results :
Motif Support Confidence Lift
0 [05M09T, 05M093] 0.066946 0.524590 …

Tryzis
- 41
- 3
1
vote
2 answers
Implementations for Pattern/String mining using Suffix Arrays/Trees
I am trying to solve a pattern mining problem for strings and I think that suffix trees or arrays might be a good option to solve this problem.
I will quickly outline the problem:
I have a set strings of different lengths (quotation are just to mark…

Pearson
- 109
- 1
- 1
- 9
1
vote
2 answers
Efficiently break up a string based on the nth occurrence of a substring using R
Introduction
Given a string in R, is it possible to get a vectorized solution (i.e. no loops) where we can break the string into blocks where each block is determined by the nth occurrence of a substring in the string.
Work done with Reproducible…

NM_
- 1,887
- 3
- 12
- 27
1
vote
1 answer
Why BIDE uses the semi-maximum period for serach space pruning?
According to the article, which defines BIDE:
BIDE: Efficient Mining of Frequent Closed Sequences
Theorem 2 (BackScan search space pruning):
Let the prefix sequence be an
n-sequence, Sp=e1e2...en. If ∃i(1≤i≤n) and there exists an item e′
which…

inf3rno
- 24,976
- 11
- 115
- 197
0
votes
1 answer
Finding a continuous route through a list of lists
I am trying to find a continuous (with strictly increasing values) path through a list of lists. I have tried various recursive and reversed approaches, but have failed for hours.
The problem stems from interval-based pattern mining. Here, each…

lnxdx
- 3
- 2
0
votes
0 answers
How to do association rule mining with different characteristics of one variable?
I am working on a task where I have data about patients with the variables reportid (date of treatment), type (T for Treatment, S for Symptoms, C for conditions) and a variable names with the assigned name of the treatment, symptom or condition.
I…

Marlene
- 1
- 1
0
votes
0 answers
Pyspark Prefixspan with Structured Streaming
I need the top n most frequently occurring consecutive sub sequence (i.e, its more like sub string) of the 2nd column.
Is it possible to use Structured Streaming with Prefix Scan?
Can anyone help me with it?
I am new to pyspark and would love it if…

Sai Aravind
- 11
- 2
0
votes
0 answers
How to represent a numerical dataset as a tree search for an interval pattern mining purpose
I am a phd student in data mining and i want to use constraint programming to solve pattern mining tasks,
Knowing that constraint programming is based on a tree search, i would like to know if there is a common way to represent the data of a…

djawed bkh
- 21
- 5
0
votes
0 answers
Extracting fp tree from Pyspark FPGrowth MLlib model
Has anybody tried doing this? It is possible to extract frequent item-sets and association rules, but tree? Maybe even how to reconstruct it if it is not being internally used.
Link to the…

studentofml
- 21
- 3