My Yarn Map-Reduce Job is taking a lot of time

Question

Input File size : 75GB

Number of Mappers : 2273

Number of reducers : 1 (As shown in the web UI)

Number of splits : 2273

Number of Input files : 867

Cluster : Apache Hadoop 2.4.0

5 nodes cluster, 1TB each.

1 master and 4 Datanodes.

It's been 4 hrs. now and still only 12% of map is completed. Just wanted to know given my cluster configuration does this make sense or is there anything wrong with the configuration?

Yarn-site.xml

         <property>
             <name>yarn.nodemanager.aux-services</name>
             <value>mapreduce_shuffle</value>
             </property>
             <property>
             <name>yarn.nodemanager.aux- services.mapreduce.shuffle.class</name>
             <value>org.apache.hadoop.mapred.ShuffleHandler</value>
             </property>
             <property>
             <name>yarn.resourcemanager.resource- tracker.address</name>
             <value>master:8025</value>
             </property>
             <property>
             <name>yarn.resourcemanager.scheduler.address</name>
             <value>master:8030</value>
             </property>
             <property>
              <name>yarn.resourcemanager.scheduler.address</name>
             <value>master:8030</value>
             </property>
             <property>
             <name>yarn.resourcemanager.address</name>
             <value>master:8040</value>
             </property>
             <property>
             <name>yarn.resourcemanager.hostname</name>
             <value>master</value>
             <description>The hostname of the RM.</description>
             </property>
             <property>
             <name>yarn.scheduler.minimum-allocation-mb</name>
             <value>1024</value>
             <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
             </property>
             <property>
             <name>yarn.scheduler.maximum-allocation-mb</name>
             <value>8192</value>
             <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
             </property>
             <property>
             <name>yarn.scheduler.minimum-allocation-vcores</name>
             <value>1</value>
             <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
             </property>
             <property>
             <name>yarn.scheduler.maximum-allocation-vcores</name>
             <value>32</value>
             <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
             </property>
             <property>
             <name>yarn.nodemanager.resource.memory-mb</name>
             <value>8192</value>
             <description>Physical memory, in MB, to be made available to running containers</description>
             </property>
             <property>
             <name>yarn.nodemanager.resource.cpu-vcores</name>
             <value>4</value>
             <description>Number of CPU cores that can be allocated for containers.</description>
             </property>
             <property>
             <name>yarn.nodemanager.vmem-pmem-ratio</name>
             <value>4</value>
             </property> 
             <property>
   <name>yarn.nodemanager.vmem-check-enabled</name>
   <value>false</value>
   <description>Whether virtual memory limits will be enforced for containers</description>
</property>

Map-Reduce job where I am using multiple outputs. So reducer will emit multiple files. Each machine has 15GB Ram. Containers running are 8. Total memory available is 32GB in RM Web UI.

Any guidance is appreciated. Thanks in advance.

Can you provide information about what type of job you are running?? And also what is the RAM available in each machine. And can you login to the resource manager UI and check the total memory available to the cluster and the number of containers running in parallel. I suspect the job is under utilizing the resources. — Shivanand Pawar, Feb 19 '16 at 12:17
@shivanand pawar : Map-Reduce job where I am using multiple outputs. So I will have multiple files. Each machine has 15GB Ram. Containers running are 8. Total memory available is 32GB. — Shash, Feb 19 '16 at 12:49

score 1 · Answer 1 · edited May 23 '17 at 11:59

1

A few points to check:

The block & split size seems very small considering the data you shared. Try increasing both to an optimal level.
If not used, use a custom partitioner that would uniformly spread your data across reducers.
Consider using combiner.
Consider using appropriate compression (while storing mapper results)
Use optimum number of block replication.
Increase the number of reducers as appropriate.

These will help increase performance. Give a try and share your findings!!

Edit 1: Try to compare the log generated by a successful map task with that of the long running map task attempt. (12% means 272 map tasks completed). You will get to know where it got stuck.

Edit 2: Tweak these parameters: yarn.scheduler.minimum-allocation-mb, yarn.scheduler.maximum-allocation-mb, yarn.nodemanager.resource.memory-mb, mapreduce.map.memory.mb, mapreduce.map.java.opts, mapreduce.reduce.memory.mb, mapreduce.reduce.java.opts, mapreduce.task.io.sort.mb, mapreduce.task.io.sort.factor

These will improve the situation. Take trial and error approach.

Also refer: Container is running beyond memory limits

Edit 3: Try to understand a part of the logic, convert it to pig script, execute and see how it behaves.

edited May 23 '17 at 11:59

Community

1
1

answered Feb 19 '16 at 12:20

Marco99

1,639
1
19
32

I was also wondering why there is such a huge ammount of inputs splits and just 1 reducer... Is this a self-written MR application or is this using hive/pig ? – Havnar Feb 19 '16 at 12:25
It's completely a Map-Reduce program where based on certain conditions we are scanning the data. We are using multiple outputs, so the reducer will emit multiple files. – Shash Feb 19 '16 at 12:51
I can't make any addition or deletion to the code. I have just set up the cluster and trying to run the job in the new cluster. – Shash Feb 19 '16 at 12:54
In the succeeded map task logs, I can see the note "container killed by the applicationmaster. container killed on request. exit code is 143". – Shash Feb 19 '16 at 14:23
Refer links: http://stackoverflow.com/questions/29001702/why-yarn-java-heap-space-memory-error http://stackoverflow.com/questions/30533501/hadoop-mapper-is-failing-because-of-container-killed-by-the-applicationmaster – Marco99 Feb 19 '16 at 15:45
Referred. Made the changes. Still the same. It's very slow in execution. Now after restarting it has executed 104 Mappers in 3 hrs. – Shash Feb 19 '16 at 18:00
1000 mappers in 24 hrs :( – Shash Feb 20 '16 at 15:21
1

yarn.nodemanager.resource.memory-mb = 8GB yarn.scheduler.minimum-allocation-mb = 1GB yarn.scheduler.maximum-allocation-mb = 8GB mapreduce.map.memory.mb = 4GB mapreduce.reduce.memory.mb = 8GB mapreduce.map.java.opts = 3GB mapreduce.reduce.java.opts = 6GB yarn.app.mapreduce.am.resource.mb = 8GB yarn.app.mapreduce.am.command-opts = 6GB – Shash Feb 21 '16 at 14:45
@marco99 : I had tweaked the parameters mentioned by you. But still the same. Can't understand what I am missing here :( – Shash Feb 21 '16 at 14:46
The MR job took 51 hrs. to complete. I have got the desired output. But still I am not able to understand what went wrong as to for the job to take 51 hrs. to complete. I will continue my investigation and if I am successful will post my findings here. Thanks all for your help! – Shash Feb 23 '16 at 06:32
@ Shash, First congrats, your job completed due to your patience. I still wonder why blocksize can't be changed and reducers can't be increased. These affect parallelism and performance directly. – Marco99 Feb 23 '16 at 09:43

My Yarn Map-Reduce Job is taking a lot of time

1 Answers1