1

I am using apache spark (2.4.4) with yarn for better job scheduling and enabling multiple concurrent jobs. But my job is continuously in ACCEPTED state and not moving forward.

My cluster consists of 10 nodes (1 master and 9 slaves). 8 CPUs and 16 GB memory per node. Yarn is working perfectly fine and can see the nodes in the UI and no unhealthy nodes, but not sure the jobs are in ACCEPTED state for long time and getting FAILED after some point of time. I read through different articles and everywhere it is mentioned that "Yarn doesn't have enough resources to allocate". But my cluster is fairly big and I am running a small job. Any help in this would be appreciated.

yarn-site.xml

<configuration>
    <property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname</name>
        <value>${hostname}</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>12288</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>12288</value>
    </property>
</configuration>

Yarn UI

enter image description here

Applications

enter image description here

Fair scheduler options

<?xml version="1.0" encoding="UTF-8"?>
<allocations>
   <defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
   <queue name="prod">
      <weight>40</weight>
      <schedulingPolicy>fifo</schedulingPolicy>
   </queue>
   <queue name="dev">
      <weight>60</weight>
      <queue name="eng" />
      <queue name="science" />
   </queue>
   <queuePlacementPolicy>
      <rule name="specified" create="false" />
      <rule name="primaryGroup" create="false" />
      <rule name="default" queue="dev.eng" />
   </queuePlacementPolicy>
</allocations>

Spark submit command

./bin/spark-submit --class Main --master yarn --deploy-mode cluster --driver-memory 1g --executor-memory 1g aggregator-1.0-SNAPSHOT.jar
Arjun
  • 545
  • 1
  • 6
  • 17
  • show us your spark-submit command and executor/driver configuration – maxime G Aug 17 '20 at 15:30
  • Updated the spark submit command in the description – Arjun Aug 17 '20 at 15:51
  • 12288 mb memory for yarn? – Lamanus Aug 17 '20 at 22:39
  • Spark executors are running inside the yarn container right. I need one executor to have 12GB of memory.https://stackoverflow.com/questions/38331502/spark-on-yarn-resource-manager-relation-between-yarn-containers-and-spark-execu – Arjun Aug 18 '20 at 04:26

0 Answers0