I have a RDD[(String,Array[String])] and I need to replicate the data inside to increase the size of it.
I've read here https://stackoverflow.com/a/41787801/9759150 with replacemente you can get the same element in sample twice.
For example:
If RDD.count() is, let's say, 35 elements, and I need to generate from it an RDD with 200 elements. How can I do this?
I saw applying sample is like this:
val sampledRDD = rdd.sample(true, fraction, seed)
I do not how can I choose fraction
parameter to my problem.
Thank you!