How does spark work with cluster configuration?

Question

Suppose that I have the following java code

SparkConf sparkConf = new SparkConf().setAppName("myApp");
JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);`

JavaRDD<A> firstRDD = sparkContext.parallelize(B, 2);
 JavaRDD<A> secondRDD = firstRDD.map(runSomethingAndReturnSomething());


A objectA = secondRDD.collect();
doSomethingWithA(objectA)

I want to run the code in cluster mode so I use spark-submit, start a master and a slave.

As I understand (correct me if I'm wrong) this should happen:

In the driver (master) start the spark context.
I say to the master that I want to use the B object in two partitions in parallel.
The master will send the command (map) to the workers but they still wont executed.
Finally, I want to make a collect, the workers will start the transformation and start the map command when they finish they will send to the master the results.
I make something with the results collected in the master.

The issue is that basically the collect is being done in the slave node and not in the master node, why is this happening?

Your answer is here. https://stackoverflow.com/questions/37027732/apache-spark-differences-between-client-and-cluster-deploy-modes — Amit Kumar, Jul 23 '18 at 02:23
The results of the collect will be serialized back to the driver, but they are always run by the executors — OneCricketeer, Jul 23 '18 at 02:25

How does spark work with cluster configuration?

0 Answers0