2

I am trying to run Association rules using Spark Scala. I first create an FPGrowth tree and pass that to the Association Rules method.

However, I wish to add a maximum pattern length parameter, to limit the number of items I want on the LHS and RHS. I only want one-to-one associations between items.

    val model = new FPGrowth()
      .setMinSupport(0.1)
      .setNumPartitions(10)
      .run(transactions)

    // Generate association rules based on the frequent sets generated by FPgrowth
    val ar = new AssociationRules().setMinConfidence(0.6)
    val results = ar.run(model.freqItemsets)

The resulting association rules are:

ItemA => ItemB, {confidence}

ItemB => ItemC, {confidence}

ItemA,ItemB => ItemC, {confidence}

ItemA,ItemD => ItemE, {confidence}

But I only want it to return results that have one item on both sides, i.e.:

ItemA => ItemB, {confidence}

ItemB => ItemC, {confidence}

Basically, I am looking for a way to specify the maximum length parameter in Spark Scala/Spark Java

Any suggestions?

koiralo
  • 22,594
  • 6
  • 51
  • 72
nupur.g
  • 23
  • 2

1 Answers1

1

You can filter the results:

val ar = new AssociationRules().setMinConfidence(0.6)
val results = ar.run(model.freqItemsets)
                .filter(rule => rule.antecedent.size == 1 && rule.consequent.size == 1)
Jeffrey Chung
  • 19,319
  • 8
  • 34
  • 54
  • The one big disadvantage is that this still actually searches for them... so we spend time and problematically memory on these associations we are not interested in.. – Roelant Feb 08 '18 at 08:03