2

How am I misusing/misreading the use of the FPGrowth algorithm in Pyspark, I have a Apriori algorithm output I was hoping to be the same. Provided is my FPGrowth code, my Apriori output, and my FPGrowth output.

from pyspark.mllib.fpm import FPGrowth
from pyspark import SparkConf
from pyspark.context import SparkContext
sc = SparkContext.getOrCreate(SparkConf().setMaster("local[*]"))
data = sc.textFile("C:\\Users\\marka\\Downloads\\Assig2.txt")
data.map(lambda line: line.strip().split())
transactions = data.map(lambda line: line.strip().split('\t'))
#notempty = transactions.map(lambda x: x is not '')
unique = transactions.map(lambda x: list(set(x))).cache()
model = FPGrowth.train(unique, minSupport=0.7, numPartitions=10)
result = model.freqItemsets().collect()
for fi in result:
    print(fi)

Apriori output: enter image description here

FPGrowth output:

enter image description here

Am I misinterpreting the results or is there another way to output FPGrowth to be able to interpret the results like the Apriori?

To test, I used Weka for FPGrowth and got results similar to Apriori, so indication is my Pyspark output method is incorrect but documentation is always for fi in result: print(fi) so I'm unsure how.

Weka FPGrowth output: enter image description here

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Mark McGown
  • 975
  • 1
  • 10
  • 26

0 Answers0