1

I have a YARN cluster and dozens of nodes in the cluster. My program is a map-only job. Its Avro input is very small in size with several million rows, but processing a single row requires lots of CPU power. What I observe is that many maps tasks are running on a single node, whereas other nodes are not participating. That causes some nodes to be very slow and affecting overall HDFS performance. I assume this behaviour is because of the Hadoop data-locality.

I'm curious whether it's possible to switch it off, or is there another way to force YARN to distribute map tasks across more uniformly across cluster?

Thanks!

Vyacheslav
  • 1,186
  • 2
  • 15
  • 29

1 Answers1

1

Assuming you can't easily redistribute the data more uniformly across the cluster (surely not all your data is on 1 node right?!) this seems to be the easy way to relax locality:

yarn.scheduler.capacity.node-locality-delay

This setting should have a default of 40, try setting it to 1 to see whether this has the desired effect. Perhaps even 0 could work.

Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
  • 1
    In case you use FairScheduler this could also be relevant: http://tech-blog.flipkart.net/2015/05/is-data-locality-always-out-of-the-box-in-hadoop-not-really/ – Dennis Jaheruddin Aug 10 '16 at 13:48
  • Thanks, I will check it out. The data size is actually 20MB in total, so most likely it's located on the same node + 2 replicas on other nodes. We started to use a fair scheduler recently, so thanks for the link too! – Vyacheslav Aug 10 '16 at 13:57
  • @Vyacheslav: 20 MB is way too small to process in Hadoop. – Marco99 Aug 10 '16 at 15:37
  • Input is small, but the processing time is very high. Up to 1 minute for 1 row and I have 1 million of rows in my input. We use hadoop is a nice means to distribute the load automatically across many machines + have elasticity in case other YARN queues are free or busy. So I believe my use case makes sense. – Vyacheslav Aug 10 '16 at 16:18
  • @Vyacheslav: May be the following is not a clean approach: minimize the block size in HDFS and extract the data from that avro file as a text file in HDFS and run that map only job. – Marco99 Aug 10 '16 at 17:31
  • @Vyacheslav: Before doing the above, Please read this post also: http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop – Marco99 Aug 10 '16 at 17:39
  • @Marco99: actually I'm already using mapred.max.split.size=12720 to have more mappers that are "smaller" in size. That allows for better elasticity and if one mapper fails for some reason, not so many records are affected (yes, I'm also having mapred.max.map.failures.percent=1 set). Regarding the block size: I cannot change it since it will affect many other jobs that we have. – Vyacheslav Aug 11 '16 at 07:50
  • @Dennis: unfortunately, playing around with yarn.scheduler.capacity.node-locality-delay as well as yarn.scheduler.fair.locality.threshold.* didn't give me good results. The only thing that I managed to achieve is that all map tasks are running on the same node, which is the opposite of the desired effect :) – Vyacheslav Aug 11 '16 at 07:52