1

Combiners are made using same class as reducer and mostly same code. But question when exactly it is called before sort and shuffle or before reduce when? If before sort and shuffle i. e., just after mapper then how it will get input as [key, list<values>]? as this is given by sort and shuffle. Now if it is called after sort and shuffle i. e., just before reducer then output to combiner is [key, value] like reducer then how reducer will get input as [key, list<values>]?

vanekjar
  • 2,386
  • 14
  • 23
Abhi
  • 33
  • 5
  • 1
    Possible duplicate of [On what basis mapreduce framework decides whether to launch a combiner or not](http://stackoverflow.com/questions/17160852/on-what-basis-mapreduce-framework-decides-whether-to-launch-a-combiner-or-not) – Andrea Mar 08 '17 at 14:18

4 Answers4

1

Output types of a combiner must match output types of a mapper. Hadoop makes no guarantees on how many times the combiner is applied, or that it is even applied at all.

If your mapper extends Mapper< K1, V1, K2, V2 > and your reducer extends
Reducer< K2, V2, K3, V3 >, then the combiner must be an extension of
Reducer< K2, V2, K2, V2 >.

Combiner is applied at the same machine as the map operation. Definitely before shuffle.

As referred to the Hadoop documentation:

When the map operation outputs its pairs they are already available in memory. For efficiency reasons, sometimes it makes sense to take advantage of this fact by supplying a combiner class to perform a reduce-type function. If a combiner is used then the map key-value pairs are not immediately written to the output. Instead they will be collected in lists, one list per each key value. When a certain number of key-value pairs have been written, this buffer is flushed by passing all the values of each key to the combiner's reduce method and outputting the key-value pairs of the combine operation as if they were created by the original map operation.

http://wiki.apache.org/hadoop/HadoopMapReduce

vanekjar
  • 2,386
  • 14
  • 23
  • If it runs before shuffle then that means it takes input from mapper but input for combiner is key,list and this type of output comes from sort and shuffle phase then how it can run before sort and shuffle. – Abhi Jul 10 '15 at 05:03
  • I clarified my answer little bit more. Please have a look. – vanekjar Jul 10 '15 at 19:44
1

Combiner is like a pre-reducer, which will be applied soon after the map phase before sort and shuffle phase.

It will be applied on the same host where map phase is processed, minimising data transfer across network for next phase of processing(sort-shuffle and reduce).

Because of this optimization of using the combiner, actual reducer phase will have less processing burden, results in better performance.

Shivaprasad
  • 167
  • 1
  • 9
  • Yes, this is the functionality of combiner which is correct but my question is where exactly it is called in the pipeline of mapper, s&s and reducer. – Abhi Jul 10 '15 at 05:04
  • It's actually, after map phase and before sort and shuffle. After the map phase, output will be pipelined for the next sort and shuffle phase, Combiner acts before that sort and shuffle phase. It's like, Map->Combiner->Sort n Shuffle -> Reducer – Shivaprasad Jul 10 '15 at 06:03
1

It's actually, after map phase and before sort and shuffle. After the map phase, output will be pipelined for the next sort and shuffle phase, Combiner acts before that sort and shuffle phase. It's like, Map->Combiner->Sort n Shuffle -> Reducer

Shivaprasad
  • 167
  • 1
  • 9
  • I'm sorry, I don't think so, I think the order is that " map -> buffer in memory -> partition -> sort -> combiner -> spill to disk -> reduce " – StrongYoung Jan 14 '18 at 11:35
0

The Map Reduce framework will not call the combiner all the time even though you write the custom Combiner. it will call the combiner for surely if number of spills is at least 3 (default). you can configure, the number of spills for which a combiner need to run can be set through min.num.splits.for.combine property.

sschale
  • 5,168
  • 3
  • 29
  • 36
steve brad
  • 21
  • 3