How to limit FPGrowth itemesets to just 2 or 3

Question

I am running the FPGrowth algorithm using pyspark in python3.6 using jupyter notebook. When I am trying to save the association rules output of rules generated is huge. So I want to limit the number of consequent. Here is the code which I have tried. I also changed the spark context parameters.

Maximum Pattern Length fpGrowth (Apache) PySpark

from pyspark.sql.functions import col, size
from pyspark.ml.fpm import FPGrowth
from pyspark.sql import Row
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from pyspark import SparkConf

conf = SparkConf().setAppName("App")
conf = (conf.setMaster('local[*]')
        .set('spark.executor.memory', '100G')
        .set('spark.driver.memory', '400G')
        .set('spark.driver.maxResultSize', '200G'))
sc = SparkContext.getOrCreate(conf=conf)
spark = SparkSession(sc)
R = Row('ID', 'items')
df=spark.createDataFrame([R(i, x) for i, x in enumerate(lol)])
fpGrowth = FPGrowth(itemsCol="items", minSupport=0.7, minConfidence=0.9)

model = fpGrowth.fit(df)
ar=model.associationRules.where(size(col('antecedent')) == 2).where(size(col('cosequent')) == 1)

ar.cache()
ar.toPandas().to_csv('output.csv')

     It gives an error


   TypeError Traceback (most recent call last)
   <ipython-input-1-f90c7a9f11ae> in <module>

   ---> 73 ar=model.associationRules.where(size(col('antecedent')) == 
  2).where(size(col('consequent')) == 1)
   TypeError: 'str' object is not callable

Can someone help me to solve the issue.

Here lol is list of list of transactions: [['a','b'],['c','a','e']....]

Python: 3.6.5 Pyspark Windows 10

Can you please show us the complete error message? The following works fine: `ar=model.associationRules.where(F.size(F.col('antecedent')) == 2).where(F.size(F.col('consequent')) == 1)` `ar.show()`. Please keep in mind that the column is called consequent and not cosequent. — cronoik, Jun 30 '19 at 21:49
It comes from `from pyspark.sql import functions as F`. The error message you get is not an spark issue, it is a python issue. Have you defined a variable with the name `size` or `col` somewhere? Try to restart your jupyter kernel. — cronoik, Jul 01 '19 at 08:06
Done. Thanks. I defined it as F and then used F.col and F.size. But I have one more question which I will post shortly. Please look into that. — Shubham Bajaj, Jul 01 '19 at 19:11

Shubham Bajaj · Answer 1 · 2019-07-02T08:10:14.833

0

From the above discussion and following this link, it helped me to resolve the problem.

'str' object is not callable TypeError

   import pyspark.sql.functions as func
   model.associationRules.where(func.size(func.col('antecedent')) == 1).where(func.size(func.col('consequent')) == 1).show()

edited Jul 02 '19 at 08:10

answered Jul 01 '19 at 19:13

Shubham Bajaj

309
1
3
12

Could you please add the actual solution to your answer? This will make your answer more valuable and helps others to find an answer more quickly. – cronoik Jul 02 '19 at 05:47

How to limit FPGrowth itemesets to just 2 or 3

1 Answers1

Linked