I have data in this format:
100 1 2 3 4 5
I use the following code to load it:
val data : RDD[(String, Array[Int])] = sc.textFile("data.txt").map(line => ((line.split("\t"))(0), (line.split("\t"))(1).split(" ").map(_.toInt)))
I want to generate pairs from the Array[Int] such that an array element with value more than a number (2 in the following code) gets paired up with all other elements of the array. I will then use that for generating further stats. For example with the sample data, I should be able to generate this first:
100 (3,1), (3,2), (3,4), (3,5),(4,1), (4,2), (4,3), (4,5)
val test = merged_data.mapValues { case x =>
for (element <- x) {
val y = x.filter(_ != element)
if (element > 2)
{
for (yelement <- y)
{
(element, yelement)
}
}
}
}
Here is the o/p that I get: Array[(String, Unit)] = Array((100,())) Not sure why it is empty.
Once I am able to resolve this, I will then sort the elements in the tuple and remove duplicates if any so the above o/p
100 (3,1), (3,2), (3,4), (3,5),(4,1), (4,2), (4,3), (4,5)
becomes this:
100 (1,3), (2,3), (3,4), (3,5), (1,4), (2,4), (4,5)