I am trying to understand the AverageByKey and CollectByKey APIs of Spark.
I read this article
http://abshinn.github.io/python/apache-spark/2014/10/11/using-combinebykey-in-apache-spark/
but I dont know if its just me.... I don't understand how these api works
Most confusing part is (x[0] + y[0], x[1] + y[1])
my understanding was that x is sum and y is count. then why are we adding the sum and count?