The timing of execution of Reducers is determined by the configuration parameter: mapreduce.job.reduce.slowstart.completedmaps
(in mapred-site.xml). This is by default set to "0.05". It means, when around 5% of Mappers are completed, the Reducers are scheduled for execution.
You can tweak this parameter to achieve different results. For e.g. setting it to "1.0" will ensure that, the Reducers will be started only after 100% of the Mappers are completed.
Redcuer tasks will start copying the data from the mappers, which have completed the execution. But, the reduce()
method will be called, only when the data from all the mappers is copied by the reducer.
This link: When do reduce tasks start in Hadoop?, clearly explains this process.
As for the speculative execution, it gets triggered only in case of Mappers/Reducers, which are lagging behind compared to other Mappers/Reducers. If the same Mapper instance is executed in duplicate, it does not mean counters are also duplicated. Task counters are maintained for each task attempt. If a task attempt fails or killed (due to speculative execution), then counters for that attempt are dropped. So, speculative execution will not have impact on the overall counter value.
One thing you must remember is that, the counter values are definitive only once a job has successfully completed.