3

Combiner works on output records of mapper. If the mapper output records are fed to the combiner then why are my combiner input records are more than mapper output records?

I got these 80 records extra.I have no idea from where they came & what their value is.

Yarn dump of Mapreduce:

 Map-Reduce Framework
            Map input records=80000000
            Map output records=80000000
            Map output bytes=2560000000
            Map output materialized bytes=80
            Input split bytes=220
            Combine input records=80000083
            Combine output records=85
            Reduce input groups=1
            Reduce shuffle bytes=80
            Reduce input records=2
            Reduce output records=3
            Spilled Records=87
            Shuffled Maps =2
            Failed Shuffles=0
            Merged Map outputs=2
            GC time elapsed (ms)=4124
            CPU time spent (ms)=90530
            Physical memory (bytes) snapshot=573521920
            Virtual memory (bytes) snapshot=2509766656
            Total committed heap usage (bytes)=411041792
shriyog
  • 938
  • 1
  • 13
  • 26
  • This should help http://stackoverflow.com/questions/12171965/why-is-the-number-of-combiner-input-records-more-than-the-number-of-outputs-of-m – Vignesh I Mar 30 '16 at 06:21
  • @VigneshI thanks for that. I calculate the count of map outputs in my combiner & find a total count at reducer across all mappers. So, is this the reason I'm getting a wrong count at reducer? – shriyog Mar 30 '16 at 14:29
  • What type of computations can't be calculated at combiner? Since, the reducer output is irrespective of the number of times a combiner runs. – shriyog Mar 30 '16 at 14:32
  • That will not be the reason. If you want to count the records then your logic is fine, emit 1 from mapper, and then summing up map output in combiner then summing up all the results in reducer. – Vignesh I Mar 31 '16 at 08:59
  • This blog will explain you in detail when to use and not to use a combiner. http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/ – Vignesh I Mar 31 '16 at 09:04

0 Answers0