I have cluster having following configuration
Number of nodes - 6 , Machine - M3.2xlarge, Number of cores per Nodes - 8, Memory per nodes -30 GB,
I am running spark application on it which is reading data from HDFS and sending to SNS/SQS.
I am using following command to run this job
spark-submit --class com.message.processor.HDFSMessageReader --master
yarn-client --num-executors 17 --executor-cores 30 --executor-memory 6G
/home/hadoop/panther-0.0.1-SNAPSHOT-jar-with-dependencies.jar
/user/data/jsonData/* arn:aws:sns:us-east-1:618673372431:pantherSNS https://sns.us-east-1.amazonaws.com true
Here I am keeping number of executors to max and varying number of executors cores , following are the result that I have got --
Here I have refereed blog given by Cloudera to calculate number of Executors and Executors memory .
Scenario 1 -- --number of Executors = 17, --number of Executors cores = 3
Result -- Total message send to SQS via SNS = 1.2 Million
Scenario 2-- --number of Executors = 17, --number of Executors cores = 10
Result -- Total message send to SQS via SNS = 4.4 Million
Scenario 3-- --number of Executors = 17, --number of Executors cores = 20
Result -- Total message send to SQS via SNS = 8.5 Million
Scenario 4-- --number of Executors = 17, --number of Executors cores = 30
Result -- Total message send to SQS via SNS = 12.7 Million
How to explain this result?
Could you please let me know how application performance getting increased by increasing the number of executors core ?