0

I am transferring all my code over to scala and I had a function in pySpark that I have little clue on how to translate over to scala. Can anybody help and provide an explanation? The PySpark looks like this:

.aggregateByKey((0.0, 0.0, 0.0),
                         lambda (sum, sum2, count), value: (sum + value, sum2 + value**2, count+1.0),
                         lambda (suma, sum2a, counta), (sumb, sum2b, countb): (suma + sumb, sum2a + sum2b, counta + countb))

Edit: What I have so far is:

val dataSusRDD = numFilterRDD.aggregateByKey((0,0,0), (sum, sum2, count) =>

But what I am having trouble understanding is how you write this in scala because of the group of functions being then designating the value into a group of actions (sum + value, etc). into the second aggregating functions all with the proper syntax. Its hard to coherently state my troubles in this scenario. Its more so I not understanding of scala and when to use the brackets, vs parentheses, vs, comma

theMadKing
  • 2,064
  • 7
  • 32
  • 59

1 Answers1

3

As @paul suggests using named functions might make understanding whats going on a bit simpler.

val initialValue = (0.0,0.0,0.0)
def seqOp(u: (Double, Double, Double), v: Double) = (u._1 + v, u._2 + v*v, u._3 + 1)
def combOp(u1: (Double, Double, Double),  u2: (Double, Double, Double)) = (u1._1 + u2._1, u1._2 + u2._2, u1._3 + u2._3)
rdd.aggregateByKey(initialValue)(seqOp, combOp)
Holden
  • 7,392
  • 1
  • 27
  • 33
  • Hello, I am getting the following issue: scala> val dataStatsRDD = numFilterRDD.aggregateByKey(initialValue, seqOp, combOp) :38: error: too many arguments for method aggregateByKey: (zeroValue: U)(seqOp: (U, Double) => U, combOp: (U, U) => U)(implicit evidence$3: scala.reflect.ClassTag[U])org.apache.spark.rdd.RDD[(Int, U)] val dataStatsRDD = numFilterRDD.aggregateByKey(initialValue, seqOp, combOp) – theMadKing Jun 05 '15 at 13:32
  • My bad, I forgot that its passed with partial application in Scala. I've updated the answer. `rdd.aggregateByKey(initialValue)(seqOp, combOp) ` – Holden Jun 06 '15 at 07:47