Standalone: Spark executor cores in AWS Ec2

Question

I'm using r4.4xlarge which has 16 Vcpus and 8 virtual cores

https://aws.amazon.com/ec2/pricing/on-demand/

https://aws.amazon.com/ec2/virtualcores/

3 questions:

1) Now how should I calculate spark.executor.cores for my spark application in a standalone cluster , this was always a confusing calculation for me

2)I guess virtual cores are at instance level not at vcpu level?

3) For instance if have 3 worker nodes and one master node all having above configuration and if I have to submit multiple spark applications at same time ,will both run at same time ? Or one job will eat up all resources though not required and another will stay in queue or if resources are Available will both applications get submitted at same time?

Note:I’m using spark rest api to submit above 2 applications as 2 separate spark submits

Well, do you want to run 1 executor, or many? Set it to 1, or set it to 8... Or 2 or 3. This really depends on your workload, not just the server types — OneCricketeer, Feb 13 '18 at 23:45
I’m using standalone and it supports 1 executor per worker node — shiv455, Feb 13 '18 at 23:53
Okay, well you can still control the core count. https://stackoverflow.com/a/39400195/2308683 — OneCricketeer, Feb 13 '18 at 23:54
So my question is how to calculate spark.executor.cores here? Will all vcpus comprise as 1 executor ? — shiv455, Feb 13 '18 at 23:55
But, if you don't want to worry about actually setting a value, try using dynamicAllocation like EMR does. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html — OneCricketeer, Feb 14 '18 at 00:00
For instance if have 3 worker nodes and one master node all having above configuration and if I have to submit multiple spark applications at same time ,will both run at same time ? Or one job will eat up all resources though not required and another will stay in queue or if resources are available both jobs will be executed at once ? — shiv455, Feb 14 '18 at 00:01
That’s not right I guess the virtual cores are at instance level not at vcpu level .correct me if I’m wrong — shiv455, Feb 14 '18 at 00:03
I don't have any experience with Stanalone deployments, so I can't really say how the jobs are scheduled, but as the docs say for the `spark.deploy.defaultCores` property... *If not set, applications always get all available cores unless they configure spark.cores.max themselves. Set this lower on a shared cluster to prevent users from grabbing the whole cluster by default*, therefore you can only control `spark.cores.max` per application — OneCricketeer, Feb 14 '18 at 00:11

Standalone: Spark executor cores in AWS Ec2

0 Answers0