Questions tagged [fpgrowth]

55 questions
4
votes
2 answers

PySpark :: FP-growth algorithm ( raise ValueError("Params must be either a param map or a list/tuple of param maps, ")

I am the beginner with PySpark. I am using FPgrowth computing association in PySpark. I followed the steps below. Data Example from pyspark.sql.session import SparkSession spark = SparkSession.builder.getOrCreate() # make some test data columns =…
James Taylor
  • 484
  • 1
  • 8
  • 23
3
votes
1 answer

Convert StringType Column To ArrayType In PySpark

I have a dataframe with column "EVENT_ID" whose datatype is String. I am running FPGrowth algorithm but throws the below error Py4JJavaError: An error occurred while calling o1711.fit. :java.lang.IllegalArgumentException: requirement failed: The…
user3198755
  • 477
  • 2
  • 10
  • 21
2
votes
1 answer

spark.databricks.queryWatchdog.outputRatioThreshold Error for FPGrowth using Pyspark on Databricks

I'm working on Market Basket Analysis using Pyspark on Databricks. The transactional dataset consists of a total of 5.4 Million transactions, with approx. 11,000 items. I'm able to run FPGrowth on the dataset, but whenever I'm trying to either…
2
votes
1 answer

How to efficiently export association rule generated using pyspark in .CSV or .XLSX file in python

After resolving this issue: How to limit FPGrowth itemesets to just 2 or 3 I am trying to export the association rule output of fpgrowth using pyspark to .csv file in python. After running for almost 8-10 hrs it gives an error. My machine has…
Shubham Bajaj
  • 309
  • 1
  • 3
  • 12
2
votes
0 answers

How to get Antecedents/Consequents from FPGrowth Algorithm in Pyspark?

How am I misusing/misreading the use of the FPGrowth algorithm in Pyspark, I have a Apriori algorithm output I was hoping to be the same. Provided is my FPGrowth code, my Apriori output, and my FPGrowth output. from pyspark.mllib.fpm import…
2
votes
1 answer

fpgrowth error in R

I am trying to fit a fpgrowth model on a in-built data set called Adult. While fitting a model, I was getting an error as shown below. Error in .jcall(jPruning, "[[Ljava/lang/String;", "fpgrowth", support, : method fpgrowth with signature…
789372u
  • 77
  • 1
  • 8
1
vote
0 answers

Spark MLlib FPGrowth not working with 40+ items in Frequent Item set

Spark FPGrowth works well with millions of transactions (records) when the frequent items in the Frequent Itemset is less than 25. Beyond 25 it runs into computational limit (executor computing time keeps growing). For 40+ items in the Frequent…
1
vote
2 answers

Compare the annual rates between groups

I am strugling into comparing the rates 'of mortality' between two percentages over time interval. My goal is to get the annual rates per group. My values are already in percentages (start and end values), representing how mych forest have been lost…
maycca
  • 3,848
  • 5
  • 36
  • 67
1
vote
0 answers

Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent error using rCBA::fpgrowth

I have the following dataset for which I want to generate association rules using FP growth > head(order_pairs) # A tibble: 6 x 2 product_A product_B 1 Organic…
code-noob
  • 11
  • 1
1
vote
0 answers

How to use the consequent parameter in fpgrowth algorithm in the rCBA package in R?

The items column in the transactions I am passing to the fpgrowth method are of the form { Bag of Organic Bananas, Cornbread Mix, …
kenneth-rebello
  • 914
  • 1
  • 7
  • 13
1
vote
1 answer

Error calling rCBA::fpgrowth: method fpgrowth with signature (DDI)[[Ljava/lang/String; not found

I wrote the R code below to mine with the FP-Growth algorithm: fpgabdata <- read.csv('../Agen Biasa.csv', header = FALSE) train <- sapply(fpgabdata, as.factor) train <- data.frame(train, check.names = TRUE) txns <-…
Mr Simple
  • 21
  • 2
1
vote
0 answers

FP-Growth cannot processing

I have a problem processing the fp-growth algorithm on Rstudio this is my first time using R I write code FpgConf = rCBA :: fpgrowth (dataset, support = 0.1, confidence = 0.5, maxLength = 2, consequent = "Species", parallel = FALSE) en then system…
Mr Simple
  • 21
  • 2
1
vote
1 answer

How to interpret results of Mlxtend's association rule

I am using mlxtend to find association rules: Here is the code: df = apriori(dum_data, min_support=0.4, use_colnames=True) rules = association_rules(df, metric="lift", min_threshold=1) rules2=rules[ (rules['lift'] >= 1) & (rules['confidence'] >=…
MAC
  • 1,345
  • 2
  • 30
  • 60
1
vote
1 answer

pyspark--FPGrowth: how does transform work on unseen transactions?

I am using pyspark.ml.fpm.FPGrowth in Spark 2.4 and I have a question about how precisely transform works on a transactions which are new. My understanding is that model.transform will take each transaction X and find all Y such that Conf(X-->Y) >…
Nick
  • 69
  • 5
1
vote
1 answer

Appending column name to column value using Spark

I have data in comma separated file, I have loaded it in the spark data frame: The data looks like: A B C 1 2 3 4 5 6 7 8 9 I want to transform the above data frame in spark using pyspark as: A B C A_1 B_2 C_3 A_4 B_5 C_6 …
MAC
  • 1,345
  • 2
  • 30
  • 60
1
2 3 4