0

I have cluster having following configuration

Number of nodes - 6 , Machine - M3.2xlarge, Number of cores per Nodes - 8, Memory per nodes -30 GB,

I am running spark application on it which is reading data from HDFS and sending to SNS/SQS.

I am using following command to run this job

    spark-submit --class com.message.processor.HDFSMessageReader --master 
yarn-client --num-executors 17 --executor-cores 30 --executor-memory 6G  
/home/hadoop/panther-0.0.1-SNAPSHOT-jar-with-dependencies.jar
 /user/data/jsonData/* arn:aws:sns:us-east-1:618673372431:pantherSNS https://sns.us-east-1.amazonaws.com true

Here I am keeping number of executors to max and varying number of executors cores , following are the result that I have got --

Here I have refereed blog given by Cloudera to calculate number of Executors and Executors memory .

Scenario 1 -- --number of Executors = 17, --number of Executors cores = 3

Result -- Total message send to SQS via SNS = 1.2 Million

Scenario 2-- --number of Executors = 17, --number of Executors cores = 10

Result -- Total message send to SQS via SNS = 4.4 Million

Scenario 3-- --number of Executors = 17, --number of Executors cores = 20

Result -- Total message send to SQS via SNS = 8.5 Million

Scenario 4-- --number of Executors = 17, --number of Executors cores = 30

Result -- Total message send to SQS via SNS = 12.7 Million

How to explain this result?

Could you please let me know how application performance getting increased by increasing the number of executors core ?

ArK
  • 20,698
  • 67
  • 109
  • 136
suraj chopade
  • 2,833
  • 3
  • 13
  • 15
  • Hmm... that sounds interesting. Have you set yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator in capacity-scheduler.xml? What changes have you made to yarn-site.xml file? – Glennie Helles Sindholt Mar 22 '16 at 11:43
  • 1. Can you check YARN for the actual resource distribution, the amount of actual resource allocation by YARN to a program would be a better variable in this case as oppose to the requested resources (what you shown in the question) 2. Can you please clarify whether the results are consistent or not? How many time did you execute the program you mentioned in the question. – Michael Kopaniov Mar 22 '16 at 14:27
  • @Glennie I haven't changed any spark configuration, All I am using is default spark configuration. – suraj chopade Mar 23 '16 at 09:25
  • @Michael yes I verified it for lot of times ,results are same. – suraj chopade Mar 23 '16 at 09:29
  • What EMR-version are you using? On YARN (EMR 4.0, 4.1 and 4.2 at least) you need to change the `resource-calculator` to the `DominantResourceCalculator` otherwise your `--num-executors` and `--executor-core` flags are simply ignored and YARN will launch only 2 executors. Can you take a screenshot of the "Nodes of the cluster"-page in the Resource manager and post it. – Glennie Helles Sindholt Mar 23 '16 at 09:57
  • @Glennie I am using EMR-version 4.2 . And one thing is that it launched all requested --num-executors i.e 17 by default configuration only. – suraj chopade Mar 23 '16 at 11:11
  • Oh, my bad - I got it mixed up. It's only the `--executor-core` that is affected by the default `resource-calculator` (http://stackoverflow.com/questions/29964792/apache-hadoop-yarn-underutilization-of-cores). Sorry for the confusion. – Glennie Helles Sindholt Mar 23 '16 at 13:33

0 Answers0