When executing a MR job, Hadoop divides the input data into N Splits and then starts the corresponding N Map programs to process them separately.
1.How is the data divided (splited into different inputSplits)?
2.How is Split scheduled (how do you decide which TaskTracker machine the Map program that handles Split should run on)?
3.How to read the divided data?
4.How Reduce task assigned ?
In hadoop1.X
In hadoop 2.x
Both of the questions has some relationship , so I asked them together , you can show which of the part you are good at .
thanks in advance .