0
def isSmallerScore(value:Int): Boolean ={
  val const = 200
  if(value < const) true else false
}
val rdd = sc.parallelize(Seq(("Java", 100), ("Python", 200), ("Scala", 300)))
val result1: RDD[(String, Int)] = rdd.filter(x => isSmallerScore(x._2))
val result2: RDD[(String, Int)] = rdd.filter(x => !isSmallerScore(x._2))

From the above code using a filter, I have created two RDD. One is with the smaller score size and another RDD is with the higher score. Here to separate it out I have done the filter action two times.

Is it possible to create in a single filter action? How can reduce another filter action to find out the result(either result1 or result2 )

thebluephantom
  • 16,458
  • 8
  • 40
  • 83
Tulasi
  • 79
  • 1
  • 9
  • [This post](https://stackoverflow.com/a/32971246/14165730) says "it is not possible to yield multiple RDDs from a single transformation" – mck Feb 22 '21 at 17:15
  • You mean it's not possible to avoid the same action(isSmallerScore) multiple times? – Tulasi Feb 22 '21 at 17:30

1 Answers1

0

It's not ETL like Informatica BDM, Talend, Pentaho et al. Where you have multiple pipelines running in parallel (branches) that you can create graphically.

You need to cache rdd and filter twice to get 2 RDDs.

thebluephantom
  • 16,458
  • 8
  • 40
  • 83