I ran into this line in the Apache Spark code source
val (gradientSum, lossSum, miniBatchSize) = data
.sample(false, miniBatchFraction, 42 + i)
.treeAggregate((BDV.zeros[Double](n), 0.0, 0L))(
seqOp = (c, v) => {
// c: (grad, loss, count), v: (label, features)
val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1))
(c._1, c._2 + l, c._3 + 1)
},
combOp = (c1, c2) => {
// c: (grad, loss, count)
(c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3)
}
)
I have multiple trouble reading this :
- First I can't find anything on the web that explains exactly how
treeAggregate
works, what are the meaning of the params. - Second, here
.treeAggregate
seems to have two ()() following the method name. What could that mean? Is that some special scala syntax that I don't understand. - Finally, I see both seqOp and comboOp return a 3 element tuple which match the expected left hand side variable, but which one actually gets returned?
This statement must be really advanced. I can't begin to decipher this.