0

I'm using r4.4xlarge which has 16 Vcpus and 8 virtual cores

https://aws.amazon.com/ec2/pricing/on-demand/

https://aws.amazon.com/ec2/virtualcores/

3 questions:

1) Now how should I calculate spark.executor.cores for my spark application in a standalone cluster , this was always a confusing calculation for me

2)I guess virtual cores are at instance level not at vcpu level?

3) For instance if have 3 worker nodes and one master node all having above configuration and if I have to submit multiple spark applications at same time ,will both run at same time ? Or one job will eat up all resources though not required and another will stay in queue or if resources are Available will both applications get submitted at same time?

Note:I’m using spark rest api to submit above 2 applications as 2 separate spark submits

shiv455
  • 7,384
  • 19
  • 54
  • 93
  • Well, do you want to run 1 executor, or many? Set it to 1, or set it to 8... Or 2 or 3. This really depends on your workload, not just the server types – OneCricketeer Feb 13 '18 at 23:45
  • I’m using standalone and it supports 1 executor per worker node – shiv455 Feb 13 '18 at 23:53
  • Okay, well you can still control the core count. https://stackoverflow.com/a/39400195/2308683 – OneCricketeer Feb 13 '18 at 23:54
  • So my question is how to calculate spark.executor.cores here? Will all vcpus comprise as 1 executor ? – shiv455 Feb 13 '18 at 23:55
  • But, if you don't want to worry about actually setting a value, try using dynamicAllocation like EMR does. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html – OneCricketeer Feb 14 '18 at 00:00
  • For instance if have 3 worker nodes and one master node all having above configuration and if I have to submit multiple spark applications at same time ,will both run at same time ? Or one job will eat up all resources though not required and another will stay in queue or if resources are available both jobs will be executed at once ? – shiv455 Feb 14 '18 at 00:01
  • That’s not right I guess the virtual cores are at instance level not at vcpu level .correct me if I’m wrong – shiv455 Feb 14 '18 at 00:03
  • I don't have any experience with Stanalone deployments, so I can't really say how the jobs are scheduled, but as the docs say for the `spark.deploy.defaultCores` property... *If not set, applications always get all available cores unless they configure spark.cores.max themselves. Set this lower on a shared cluster to prevent users from grabbing the whole cluster by default*, therefore you can only control `spark.cores.max` per application – OneCricketeer Feb 14 '18 at 00:11

0 Answers0