I am trying to write a mapreduce program that computes an average of some statistics.
The mappers read the data in its respective segment and perform some filters.
I am using multiple Reducers.
Therefore the reducers will be capable of calculating only the local average in that partition. I however need the average of all the data coming to all the reducers. How do i pull this off?
One idea is to use global counters to hold the sum and count. But i need a segment of code that runs after all reducers have run(so that i can operate on the final sum and count) and output the average to a file. Is this a viable approach and how can i do this?
Also note that i have to use multiple reducers. So the option of having just one reducer and doing the average computation in the cleanup method is out of the window.