1

I have two RDDs. Each of them is a key-value pair:

rdd1:

('a', 1)
('b', 2)

rdd2:

('a', 3)
('c', 2)

I want to combine them in a pyspark sql dataframe such that:

        a   b   c
rdd1    1   2   0
rdd2    3   0   2

Is there a way to do so? Or do I need to change the way I create my rdd1 and rdd2?

Thank you

ernest_k
  • 44,416
  • 5
  • 53
  • 99
  • Can you share more info, how you are creating rdd's? ref this link, might give some idea https://stackoverflow.com/questions/50348236/convert-row-values-into-columns-with-its-value-from-another-column-in-spark-scal – Prabhanj Feb 25 '20 at 09:23
  • I create the rdds by counting the word frequency in a document. Each RDD is one document – Ghifari Rahadian Feb 25 '20 at 14:17

0 Answers0