As the title describes, say I have two RDDs
rdd1 = sc.parallelize([1,2,3])
rdd2 = sc.parallelize([1,0,0])
or
rdd3 = sc.parallelize([("Id", 1),("Id", 2),("Id",3)])
rdd4 = sc.parallelize([("Result", 1),("Result", 0),("Result", 0)])
How can I create the following DataFrame?
Id Result
1 1
2 0
3 0
If I could create the paired RDD [(1,1),(2,0),(3,0)] then sqlCtx.createDataFrame
would give me what I want, but I don't know how?
I'd appreciate any comment or help!