Given a large file containing data of the form, (V1,V2,...,VN)
2,5
2,8,9
2,5,8
...
I am trying to achieve a list of pairs similar to the following using Spark
((2,5),2)
((2,8),2)
((2,9),1)
((8,9),1)
((5,8),1)
I tried the suggestions mentioned in response to an older question, but I have encountered some issues. For example,
val dataRead = sc.textFile(inputFile)
val itemCounts = dataRead
.flatMap(line => line.split(","))
.map(item => (item, 1))
.reduceByKey((a, b) => a + b)
.cache()
val nums = itemCounts.keys
.filter({case (a) => a.length > 0})
.map(x => x.trim.toInt)
val pairs = nums.flatMap(x => nums2.map(y => (x,y)))
I got the error,
scala> val pairs = nums.flatMap(x => nums.map(y => (x,y)))
<console>:27: error: type mismatch;
found : org.apache.spark.rdd.RDD[(Int, Int)]
required: TraversableOnce[?]
val pairs = nums.flatMap(x => nums.map(y => (x,y)))
^
Could someone please point me towards what I might be doing incorrectly, or what might be a better way to achieve the same? Many thanks in advance.