I'm writing a Spark application using Scala. I have the following two RDDs:
(a, 1, some_values1)
(b, 1, some_values2)
(c, 1, some_values3)
and
(a, 2, some_values1)
(b, 2, some_values2)
(a, 3, some_values1)
(b, 3, some_values2)
I'm trying to get this output:
(a, 1, 2, computed_values1)
(b, 1, 2, computed_values2)
(c, 1, 2, None)
(a, 1, 3, computed_values1)
(b, 1, 3, computed_values2)
(c, 1, 3, None)
So, the letters here are used to match each record from the first RDD with the second one. I tried using the join
method but didn't work for record c
. How can I achieve this?
UPDATE
Another example:
(a, 1, some_values1)
(b, 1, some_values2)
(c, 1, some_values3)
and
(a, 2, some_values1)
(b, 2, some_values2)
(a, 3, some_values1)
(b, 3, some_values2)
(c, 3, some_values2)
I'm trying to get this output:
(a, 1, 2, computed_values1)
(b, 1, 2, computed_values2)
(c, 1, 2, None)
(a, 1, 3, computed_values1)
(b, 1, 3, computed_values2)
(c, 1, 3, computed_values3)