How to get Antecedents/Consequents from FPGrowth Algorithm in Pyspark?

Question

How am I misusing/misreading the use of the FPGrowth algorithm in Pyspark, I have a Apriori algorithm output I was hoping to be the same. Provided is my FPGrowth code, my Apriori output, and my FPGrowth output.

from pyspark.mllib.fpm import FPGrowth
from pyspark import SparkConf
from pyspark.context import SparkContext
sc = SparkContext.getOrCreate(SparkConf().setMaster("local[*]"))
data = sc.textFile("C:\\Users\\marka\\Downloads\\Assig2.txt")
data.map(lambda line: line.strip().split())
transactions = data.map(lambda line: line.strip().split('\t'))
#notempty = transactions.map(lambda x: x is not '')
unique = transactions.map(lambda x: list(set(x))).cache()
model = FPGrowth.train(unique, minSupport=0.7, numPartitions=10)
result = model.freqItemsets().collect()
for fi in result:
    print(fi)

Apriori output:

FPGrowth output:

Am I misinterpreting the results or is there another way to output FPGrowth to be able to interpret the results like the Apriori?

To test, I used Weka for FPGrowth and got results similar to Apriori, so indication is my Pyspark output method is incorrect but documentation is always for fi in result: print(fi) so I'm unsure how.

Weka FPGrowth output:

This is not Python, but *pyspark* (edited accordingly tags & title) — desertnaut, Jul 27 '18 at 22:50

How to get Antecedents/Consequents from FPGrowth Algorithm in Pyspark?

0 Answers0