Hadoop slowstart configuration

Question

What's an ideal value for "mapred.reduce.slowstart.completed.maps" for a Hadoop job? What are the rules to follow to set it appropriately?

Thanks!

score 15 · Answer 1 · answered Jul 06 '12 at 21:37

It depends on a number of characteristics of your job, cluster and utilization:

How many map slots will your job require vs maximum map capacity: If you have a job that spawns 1000's of map tasks, but only have 10 map slots in total (an extreme case to demonstrate a point), then starting your reducers early could deprive over reduce tasks from executing. In this case i would set your slowstart to a large value (0.999 or 1.0). This is also true if your mappers take an age to complete - let someone else use the reducers
If your cluster is relatively lightly loaded (there isn't contention for the reducer slots) and your mappers output a good volume of data, then a low value for slowstart will assist in getting your job to finish earlier (while other map tasks execute, get the map output data moved to the reducers).

There are probably more

Nice explanation - here is the [JIRA](https://issues.apache.org/jira/browse/MAPREDUCE-1184) with more discussion on the same. — Praveen Sripati, Jul 07 '12 at 01:48
Thanks a lot :) So it is not about "start reducer" before map is done, it's about pre-fetch data to prepared (scheduled) reducer, and it really starts working only after all data are loaded, right? — Вадим Парафенюк, Nov 05 '18 at 20:40

1 Answers1