I'm using spark with scala and I've a RDD full of tuple2 containing a complex object as key and a double. The aim is to add the double (the frequency) if the object are identical.
for that I've defined my object as follow :
case class SimpleCoocurrence(word:String, word_pos:String, cooc:String, cooc_pos:String, distance:Double) extends Ordered[SimpleCoocurrence]{
def compare(that: SimpleCoocurrence) = {
if(this.word.equals(that.word)&&this.word_pos.equals(that.word_pos)
&&this.cooc.equals(that.cooc)&&this.cooc_pos.equals(that.cooc_pos))
0
else
this.toString.compareTo(that.toString)
}
}
now I'm trying to use reduceBykey like that :
val coocRDD = sc.parallelize(coocList)
println(coocRDD.count)
coocRDD.map(tup=>tup).reduceByKey(_+_)
println(coocRDD.count)
But, the result shows that the RDD before and after processing a reducebykey contains exactly the same number of elements.
How can I perform a reduceByKey using tuple2[SimpleCoocurrence,Double] ? Is implementing Ordered trait the good way to tell Spark how to compare my objects ? Should I use only tuple2[String,Double] ?
thx,