1

I have data in this format:

100 1 2 3 4 5

I use the following code to load it:

 val data : RDD[(String, Array[Int])] = sc.textFile("data.txt").map(line => ((line.split("\t"))(0), (line.split("\t"))(1).split(" ").map(_.toInt)))

I want to generate pairs from the Array[Int] such that an array element with value more than a number (2 in the following code) gets paired up with all other elements of the array. I will then use that for generating further stats. For example with the sample data, I should be able to generate this first:

100 (3,1), (3,2), (3,4), (3,5),(4,1), (4,2), (4,3), (4,5)

val test = merged_data.mapValues { case x =>
      for (element <- x) {
        val y = x.filter(_ != element)

        if (element > 2)
          {

            for (yelement <- y)
              {
                (element, yelement)
              }
          }
      }
      }

Here is the o/p that I get: Array[(String, Unit)] = Array((100,())) Not sure why it is empty.

Once I am able to resolve this, I will then sort the elements in the tuple and remove duplicates if any so the above o/p

100 (3,1), (3,2), (3,4), (3,5),(4,1), (4,2), (4,3), (4,5)

becomes this:

100 (1,3), (2,3), (3,4), (3,5), (1,4), (2,4), (4,5)

user3803714
  • 5,269
  • 10
  • 42
  • 61

2 Answers2

0

I was able to resolve this as:

  val test = merged_data.mapValues { case x =>
  var sb = new StringBuilder

  for (element <- x) {
    val y = x.filter(_ != element)

    if (element > 2)
      {

        for (yelement <- y)
          {
            (element, yelement)
          }
      }
  }
  sb.toString()
  }
ccheneson
  • 49,072
  • 8
  • 63
  • 68
user3803714
  • 5,269
  • 10
  • 42
  • 61
0

How about something like:

val test = data.mapValues { x =>
    for {
        element <- x.filter(_ > 2);
        yelement <- x.filter(_ != element)
    } yield (element, yelement)
}

Also you might want to check out: Nested iteration in Scala, which answers why you got an empty result.

Community
  • 1
  • 1
Dennis Hunziker
  • 1,293
  • 10
  • 19