If you really need RDD
s, than you could get your result using subtract
and union
.
Assuming that you're interested in differences from the both sides, this will work:
val left = sc.makeRDD(Seq(("m1","p1"), ("m1","p2"), ("m1","p3"), ("m2","p1"), ("m2","p2"), ("m2","p3"), ("m2","p4")))
val right = sc.makeRDD(Seq(("m1","p1"), ("m1","p2"), ("m1","p3"), ("m2","p1"), ("m2","p2"), ("m2","p3"), ("m3","p1")))
val output = left.subtract(right).union(right.subtract(left))
output.collect() // Array[(String, String)] = Array((m2,p4), (m3,p1))
On the other hand, if don't mind keeping "full outer join" in memory, you could achieve the same using cogroup
:
val output = left.cogroup(right).flatMap { case (k, (i1, i2)) =>
val s1 = i1.toSet
val s2 = i2.toSet
val diff = (s1 diff s2) ++ (s2 diff s1)
diff.toList.map(k -> _)
}
output.collect() // Array[(String, String)] = Array((m2,p4), (m3,p1))