1

For example we have the string "abcdabcd"

And we want to count all the pairs (e.g: "ab" or "da") that are available in the string.

So how do we do that in apache spark?

I asked this cause it looks like that the RDD does not support sliding function:

rdd.sliding(2).toList
//Count number of pairs in list
//Returns syntax error on first line (sliding)
lkn2993
  • 566
  • 7
  • 26

1 Answers1

5

Apparently it supports sliding via mllib as shown by zero323 here

import org.apache.spark.mllib.rdd.RDDFunctions._

val str = "abcdabcd"

val rdd = sc.parallelize(str)

rdd.sliding(2).map(_.mkString).toLocalIterator.forEach(println)

will show

ab
bc
cd
da
ab
bc
cd

Community
  • 1
  • 1
Odomontois
  • 15,918
  • 2
  • 36
  • 71