Problem Statement
Herewith mentioned the example and expected result. Tree is described with the 3 columns(tree depth is dynamic) and relationship exist in columns.
It is required to loop them into one row by key in pyspark RDD
. Any idea would be appreciated ? Thank you.
Example RDD:
(null,a1,null) (null,a2,a1) (null,a3,a2) (null,a4,a3) (b1,null,a4)
Expected Result
b1->a4->a3->a2->a1, result RDD: (b1,(a4,a3,a2,a1))