Is there a RDD method like take but which do not get all the elements in memory. For exemple, I may need to take 10^9 elements of my RDD and keep it as an RDD. What is the best way to do that ?
EDIT: A solution could be to zipWithIndex and filter with index < aBigValue but I am pretty sure there is a better solution.
EDIT 2: The code will be like
sc.parallelize(1 to 100, 2).zipWithIndex().filter(_._2 < 10).map(_._1)
It is a lot of operations just to reduce the size of an RDD :-(