How to use reduceKey function to get unpaired records

Asked Jun 26 '18 at 20:32

Active Jun 26 '18 at 22:51

Viewed 67 times

I want to merge 2 records based on key but don't want to miss unpaired records too. For example, I have the below paired RDD:

(key=1, (2, created_on))
(key=1, (3, created_on))
(key=2 (5, created_on))

Now when I use reduceByKey on function for latest 'created_on', it merges first 2 records and get 1 record which is most recent. This is the correct behavior.

However, the 3rd record is missing. How I can get the unpaired rdd record so that I can union it to merged RDD?

edited Jun 26 '18 at 22:51

asked Jun 26 '18 at 20:32

Ani

4

I'm not sure I understand how you can have an `rdd` like the one you described. Your third record is missing the "key". Should it be something like `(None, (5, created_on))`? Can you [edit] your question to include your code, the output you're currently getting, and the desired output? – pault Jun 26 '18 at 20:39
Yes Paul you are right.. edited question.. my question is how to get unpaired records like you mentioned for None types too. – Ani Jun 26 '18 at 22:52
1

Please also include the code you are using. See [how to create good reproducible apache spark dataframe examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples). – pault Jun 27 '18 at 00:52

How to use reduceKey function to get unpaired records

0 Answers0