im running a spark application on dataproc cluster (n1-standard-16) 4 machines (3 primary and 1 secondary)
in idle scenario i can see 16 vcores available which is expected.
but when my spark application is running it is going above 16 i.e 32.. like below, any idea why this is happening? is it because of threading concept?
if it is because of threading how to control it? and how i can make the maximum use out of it?
note: i have corrected the yarn ui scheduler type to fair scheduler already.
my spark request: --properties=spark.submit.deployMode=cluster,spark.hadoop.hive.exec.dynamic.partition=true,spark.sql.hive.convertMetastoreOrc=true,spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict,spark.shuffle.service.enabled=true,spark.dynamicAllocation.enabled=true,spark.dynamicAllocation.minExecutors=30,spark.dynamicAllocation.maxExecutors=180,spark.dynamicAllocation.executorIdleTimeout=60s,spark.executor.instances=70,spark.executor.cores=3,spark.serializer=org.apache.spark.serializer.KryoSerializer,spark.sql.shuffle.partitions=220,spark.executor.memory=3g,spark.driver.memory=2g,spark.yarn.executor.memoryOverhead=1g,
thanks in advance.