I'm using the following code to generate association rules in FP Growth algorithm.
model.generateAssociationRules(minConfidence).collect().foreach { rule =>
println(
rule.antecedent.mkString("[", ",", "]")
+ " => " + rule.consequent .mkString("[", ",", "]")
+ ", " + rule.confidence)
}
But whenever i'm trying to run the algorithm on a big data table with 100 million records, it fails with java heap space error.
What is the alternative of using the collect() method for executing the FP growth algorithm on big data datasets?
I'm using spark 1.6.2 with scala 2.10
Solution code
val parts1 = model.freqItemsets.partitions
parts1.map(p => {
val idx1 = p.index
val partRdd1 = model.freqItemsets.mapPartitionsWithIndex {
case(index:Int,value:Iterator[FPGrowth.FreqItemset[String]]) =>
if (index == idx1) value else Iterator()}
val dataPartitioned = partRdd1.collect().foreach{itemset =>
MasterMap1(itemset.items.mkString(",").replace(" ","")) = (itemset.freq / size).toString }
})