Hi I'm working on SparkR on yarn mode.
When I submit an application in this way:
./spark-submit --master yarn-client --packages com.databricks:spark-
csv_2.10:1.0.3 --driver-memory 6g --num-executors 8 --executor-memory 6g
--total-executor-cores 32 --executor-cores 8 /home/sentiment/Scrivania/test3.R
One node start as AM (I think is chosen randomly) and take 1gb of Memory and 1 Vcore. After that ALL nodes has 7Gb of Memory and 1 Vcore for each one. (Except for the node who starts AM that has 8gb and 2core)
Why nodes do not acquire 4 cores as configuration/spark submit says?
spark-default
spark.master spark://server1:7077
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 5g
spark.executor.memory 6g
spark.executor.cores 4
spark.akka.frameSize 1000
spark.yarn.am.cores 4
spark.kryoserializer.buffer.max 700m
spark.kryoserializer.buffer 100m
Yarn-manager
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>server1:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>server1:8035</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>server1:8050</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>server1:8088</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>4</value>
</property>
</configuration>
Update1:
Read from old post that I needed to change the value of this property below from Default to Dominant at capacity-scheduler.xml
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>
Added at Spark-env
SPARK_EXECUTOR_CORES=4
Nothing changed.
Update2: I read this from spark official page, so 1 core for each executor in Yarn mode is the maximum value?
spark.executor.cores The number of cores to use on each executor. For YARN and standalone mode only. In standalone mode, setting this parameter allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker. Otherwise, only one executor per application will run on each worker.