I now have a lot of key value pairs (key, value)
Now for one key, I don't want to get the value's average or some other aggregations, I just need one value. (Get the distinct keys)
Let me have an example here,
("1","apple")
("1","apple")
("2","orange")
("2","orange")
("1","apple")
("1","pear")
The result can be
("2","orange")
("1","apple")
or
("2","orange")
("1","pear")
I can use reduceByKey(((a,b) => a))
to get this, but as there are a lot of keys, the time is very long.
Any one have some better suggestions ?
Thanks!