I have an RDD as the types of (key,value) which value is a case class, I need to reduce this RDD to (Key, ArrayBuffer(values))..based on comments below the typical way is using reducebykey method..however, I wanted to know if I can do this with reducebykey as it is a more efficient way based on this article.
Asked
Active
Viewed 160 times
-3
-
1`groupByKey` is a method on pair RDDs. – Alec Sep 04 '16 at 06:45
-
Ok, how to write the function to add all values of a key to a collection?I donkt know how to do that – Mahdi Sep 04 '16 at 06:47
-
2By reading the documentation. Seriously, this is in probably every tutorial on Spark, and you should be reading the Spark scala doc. @alec has pointed you at a function that does what you need, go read the doc on it – The Archetypal Paul Sep 04 '16 at 07:32
-
Unless you post some additional information in your post, i would pretty much agree with @TheArchetypalPaul on this. So I would say, read as much as you can with the scala documentation. if you still find things you don't understand, modify your question and we would gladly explain and answer it as detailed as possible. – jtitusj Sep 04 '16 at 08:56
-
Now your question doesn't make sense. It saus the typical way is by `reduceByKey` but you want to know if you can do `reduceByKey`. – The Archetypal Paul Sep 04 '16 at 09:50
-
@TheArchetypalPaul Don't suggest solutions like this. In the worst case scenario these are _incredibly_ bad, in the best case scenario don't improve over groupByKey at all. – zero323 Sep 04 '16 at 10:15
-
@TheArchetypalPaul Given a vague description provided by OP the only reasonable solution has been already provided (`groupByKey`). You simply cannot do better than that if you require this particular signature and make no assumptions about data distribution. – zero323 Sep 04 '16 at 10:57
-
Thanks Sarvesh for pointing me to the proper answer.. – Mahdi Sep 04 '16 at 22:53
1 Answers
0
// Consider pairRdd is the RDD that contains the (key, value) then
val groupedPairRDD = pairRdd.groupByKey
The output groupedPairRDD is your expected output. It contains the collection of values against the keys.

Hokam
- 924
- 7
- 19