NESTED PARALLELIZATIONS?
Let's say I am trying to do the equivalent of "nested for loops" in Spark. Something like in a regular language, let's say I have a routine in the inside loop that estimates Pi the way the Pi Average Spark example does (see Estimating Pi)
i = 1000; j = 10^6; counter = 0.0;
for ( int i =0; i < iLimit; i++)
for ( int j=0; j < jLimit ; j++)
counter += PiEstimator();
estimateOfAllAverages = counter / i;
Can I nest parallelize calls in Spark? I am trying and have not worked out the kinks yet. Would be happy to post errors and code but I think I am asking a more conceptual question about whether this is the right approach in Spark.
I can already parallelize a single Spark Example / Pi Estimate, now I want to do that 1000 times to see if it converges on Pi. (This relates to a larger problem we are trying to solve, if something closer to MVCE is needed I'd be happy to add )
BOTTOM LINE QUESTION I just need someone to answer directly: Is this the right approach, to use nested parallelize calls? If not please advise something specific, thanks! Here's a pseudo-code approach of what I think will be the right approach:
// use accumulator to keep track of each Pi Estimate result
sparkContext.parallelize(arrayOf1000, slices).map{ Function call
sparkContext.parallelize(arrayOf10^6, slices).map{
// do the 10^6 thing here and update accumulator with each result
}
}
// take average of accumulator to see if all 1000 Pi estimates converge on Pi
BACKGROUND: I had asked this question and got a general answer but it did not lead to a solution, after some waffling I decided to post a new question with a different characterization. I also tried to ask this on the Spark User maillist but no dice there either. Thanks in advance for any help.