0

I want to implement some sequential algorithm on RDD.

For example:

val conf = new SparkConf()
conf.setMaster("local[2]").
  setAppName("SequentialSuite")
val sc = new SparkContext(conf)
val rdd = sc.
  parallelize(Array(1, 3, 2, 7, 1, 4, 2, 5, 1, 8, 9), 2).
  sortBy(x => x, true)
rdd.foreach(println)

I want to see the ordered number on my screen, but it shows unordered integers. The two partitions execute the println simultaneously.

How do I make the RDD execute a function globally sequential?

wush978
  • 3,114
  • 2
  • 19
  • 23

1 Answers1

1

I found the answer according to Spark: Best practice for retrieving big data from RDD to local machine:

val rdd : RDD[Int] = sc.parallelize(Array(1, 3, 2, 7, 1, 4, 2, 5, 1, 8, 9)).sortBy(x => x, true)
for(p <- rdd.partitions) {
    val partrdd = rdd.mapPartitionsWithIndex((i : Int, iter : Iterator[Int]) => if (i == p.index) iter else Iterator(), true)
    partrdd.foreach(println)
}
Community
  • 1
  • 1
wush978
  • 3,114
  • 2
  • 19
  • 23