How does MapReduce recover from errors if failure happens in an intermediate stage

Question

In Spark, I know that errors are recovered by doing recomputation of the RDDs unless an RDD is cached. In that case, the computation can start from that cached RDD.

My question is, how errors are recovered in MapReduce frameworks (such as Apache Hadoop). Let us say, a failure occured in the shuffle phase (After map and before reduce that is), how would it be recovered. Would the map step be performed again. Is there any stage in MapReduce where output is stored in the HDFS, so that computation can restart only from there? And what about a Map after Map-Reduce. Is output of reduce stored in HDFS?

Of course its not. I am talking about the MapReduce framework. I was just saying that I know how this works in Apache Spark, but curious about how failure recovery happens in MapReduce frameworks such Apache Hadoop. — pythonic, Oct 23 '16 at 14:36

Kamal Kunjapur · Accepted Answer · 2016-10-23T15:01:42.510

What you are referring to is classified as failure of task which could be either a map task or reducer task

In case of a particular task failure, Hadoop initiates another computational resource in order to perform failed map or reduce tasks.

When it comes to failure of shuffle and sort process, it is basically a failure in the particular node where reducer task has failed and it would be set to run afresh in another resource (btw, reducer phase begin with shuffle and sort process).

Of course it would not allocate the tasks infinitely if they keep failing. There are two properties below which can determine how many failures or attempts of a task could be acceptable.

mapred.map.max.attempts for Map tasks and a property mapred.reduce.max.attempts for reduce tasks.

By default, if any task fails four times (or whatever you configure in those properties), the whole job would be considered as failed. - Hadoop Definitive Guide

In short shuffle and sort being a part of reducer, it would only initiate attempt to rerun reducer task. Map tasks would not be re-run as they are considered as completed.

Is there any stage in MapReduce where output is stored in the HDFS, so that computation can restart only from there?

Only the final output would be stored in HDFS. Map's outputs are classified as intermediate and would not be stored in HDFS as HDFS would replicate the data stored and basically why would you want HDFS to manage intermediate data that's of no use after the completion of job. There would be additional overhead of cleaning it up as well. Hence Maps output are not stored in HDFS.

And what about a Map after Map-Reduce. Is output of reduce stored in HDFS?

The output of reducer would be stored in HDFS. For Map, I hope the above description would suffice.

That was very useful. So, in case, the data from the map gets lost, and a reducer task depending on that data also fails, what would happen? I guess the recomputation would restart from map. Right? — pythonic, Oct 23 '16 at 15:23
'Reducer task' has three phases. Shuffle, Sort and Reduce. While Shuffle can start as soon as few maps are done, unlike it, Sort and Reduce would only start when all Maps are done. You can refer to this link as well for more info - http://stackoverflow.com/a/11673808/3838328 — Kamal Kunjapur, Oct 23 '16 at 15:29

How does MapReduce recover from errors if failure happens in an intermediate stage

1 Answers1