0

I ran several streaming spark jobs and batch spark jobs in the same EMR cluster. Recently, one batch spark job is programmed wrong, which consumed a lot of memory. It causes the master node not response and all other spark jobs stuck, which means the whole EMR cluster is basically down.

Are there some way that we can restrict the maximum memory that a spark job can consume? If the spark job consumes too much memory, it can be failed. However, we do not hope the whole EMR cluster is down.

The spark jobs are running in the client mode with spark submit cmd as below.

spark-submit --driver-memory 2G --num-executors 1 --executor-memory 2G --executor-cores 1 --class test.class s3://test-repo/mysparkjob.jar
 'Classification':'yarn-site',
        'Properties':{
            'yarn.nodemanager.disk-health-checker.enable':'true',
            'yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage':'95.0',
            'yarn.nodemanager.localizer.cache.cleanup.interval-ms': '100000',
            'yarn.nodemanager.localizer.cache.target-size-mb': '1024',
            'yarn.nodemanager.pmem-check-enabled': 'false',
            'yarn.nodemanager.vmem-check-enabled': 'false',
            'yarn.log-aggregation.retain-seconds': '12000',
            'yarn.log-aggregation-enable': 'true',
            'yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds': '3600',
           'yarn.resourcemanager.scheduler.class': 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler'

enter image description here

Thanks!

yyuankm
  • 295
  • 4
  • 22

2 Answers2

1

You can utilize yarn.nodemanager.resource.memory-mb

The total amount of memory that YARN can use on a given node.

Example : If your machine is having 16 GB Ram, and you set this property to 12GB , maximum 6 executors or drivers will launched (since you are using 2gb per executor/driver) and 4 GB will be free and can be used for background processes.

Sanket9394
  • 2,031
  • 1
  • 10
  • 15
  • Thanks. I will try this setting. So it is to restrict the total amount of memory that YARN can use on a given node. Are there some other parameters or spark-submit option that can restrict the total amount of memory for a specific spark job? I thought executor-memory can give this restriction. However, the spark job can still consume much more memory than the definition in executor-memory. – yyuankm Jul 14 '21 at 02:40
  • The total memory used by a spark job will be : `driver memory + num_executor * executor memory` unless you have set `spark.dynamicAllocation.enabled` to `true`. In that case, num_executors can scale up based on work load. – Sanket9394 Jul 14 '21 at 04:35
  • Thanks. It is also what I understood. According to my spark submit, the spark job should not be able to consume more than 4GB. I do not set spark.dynamicAllocation.enabled either. However, I noticed that the spark job consumed around 20GB, which caused the whole EMR down. I also put the available memory metrics diagram in the original question. The deep drop is due to this spark job. – yyuankm Jul 14 '21 at 06:57
1

Option 1: You can run your spark-submit in cluster mode instead of client mode. In that way your master will be always free to execute other work. You can choose a smaller master instance if you want to save cost.

Advantage: As the spark driver will be created on CORE, you can add auto-scaling to it. And you will be able to use 100% cluster resources. Read more here Spark yarn cluster vs client - how to choose which one to use?


Option 2: You can create yarn queue, and submit memory heavy jobs to separate queue.

So let's say you configure 2 queue, Q1 & Q2. And you configured Q1 to take max 80% of total resources, and you submit normal jobs to Q2 as there is no max limit to it. But in case of memory heavy jobs you choose queue Q1.


Seeing your requirement I think Option 1 suits you better. And it's easy to implement, no infra change.
But with Option 2 when we did it in emr-5.26.0 we faced many challenges configuring yarn queue.

Snigdhajyoti
  • 1,327
  • 10
  • 26
  • Thanks! Cluster mode is definitely something I planned to change. May I also understand that the maximum memory for each spark job is limited as num_executors x executor_memory. – yyuankm Jul 15 '21 at 10:09
  • Yes it is. But there is only 1 driver which will be created on master. And there is driver_memory property which you can control to restrict driver total memory. Driver doesn’t need too much memory. – Snigdhajyoti Jul 16 '21 at 21:55