Speculative tasks in Hadoop

Question

New tasks in hadoop always have higher priority than speculative tasks.

Can anyone tell me how and where I can change this priority?

can you please elaborate more on the issue that you face? perhaps this post will be useful: http://stackoverflow.com/questions/15164886/hadoop-speculative-task-execution — vefthym, Dec 17 '15 at 19:26

Manjunath Ballur · Answer 1 · 2015-12-18T13:04:41.993

Hadoop Speculator uses a Estimator for estimating the run time of a task.

One of the main configuration parameters to control the speculative execution is: mapreduce.job.speculative.slowtaskthreshold (defined in mapred-site.xml and by default set to 1.0).

The definition of this parameter says:

The number of standard deviations by which a task's ave progress-rates must be lower than the average of all running tasks' for the task to be considered too slow.

It means, progress rate of each task is compared to the "mean progress rate" of all the other tasks in the job and multiplied with the value of mapreduce.job.speculative.slowtaskthreshold.

Let me explain this with example:

Let's assume there are 5 map tasks. Average progress rate is 70%. And value of mapreduce.job.speculative.slowtaskthreshold is set to 1.0.

Let's assume 1 of the map tasks is running slow and its progress rate is 50%. Since (70 x mapreduce.job.speculative.slowtaskthreshold) = (70 x 1.0) = 70%. So, 50% is less than 70%, hence this map task will be scheduled for speculative execution (assuming mapreduce.map.speculative is set to true).

So, I guess to enable aggressive speculation, you need set this mapreduce.job.speculative.slowtaskthreshold to a higher value.

But, even after enabling aggressive speculation, you won't be able to start the redundant tasks immediately after the original task starts. Because Speculative Execution will come into picture only after some of the the tasks have started and if any of the current task is lagging behind (Estimator will give this input to the Speculator). So, maybe, you have to change the Speculator class (org.apache.hadoop.mapreduce.v2.app.speculate.Speculator) to achieve it.

But, it is recommended not to use this aggressively, since it could starve other jobs (if the same job occupies too many map/reduce slots due to speculative execution).

Please check this article by QuBole on the same: http://docs.qubole.com/en/latest/user-guide/hadoop/hadoop1/speculation.html

Speculative tasks in Hadoop

1 Answers1