I know how the map reduces works and what steps I have:
- Mapping
- Shuffle and sorting
- Reducing
Off course I have Partitioning, Combiners but that's not important right now.
The interesting is that when I run map reduce jobs, looks like mappers and reducers work in parallel:
So I don't understand how it is possible.
Question 1. If I have multiple nodes that are doing mapping operation, how reducer can start working? Because Reducer can't start working without sorting right? (The input must be sorted for Reducer - if the mapper is still working, input can't be sorted).
Question 2. If I have multiple reducers, how the final data will be merged together? In other words, final results should be sorted right? It means we spend additional O( n*Log n) time to merge "multiple reducer results?"