1

I have a 4G file with ~ 16 mill lines, maps are running distributed with 6 maps in parallel out of 15 maps. Generates 35000 keys. I am using MultipleTextoutput so each reducer generates a output independent of other reducer.

I have configured the conf with 25-50 reducers, but it always runs 1 reducer at a time.

Machine - 4 core 32 G ram single machine running hortonworks stack

How do I get more than 1 reduce task to run in parallel ?

Hari
  • 75
  • 1
  • 6

2 Answers2

0

Have a look hadoop MapReduce Tutorial

How Many Reduces?

The right number of reduces seems to be 0.95 or 1.75 multiplied by ( * ).

With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

Have a look at related SE questions:

How hadoop decides how many nodes will do map and reduce tasks

What is Ideal number of reducers on Hadoop?

Community
  • 1
  • 1
Ravindra babu
  • 37,698
  • 11
  • 250
  • 211
  • 1
    My question is not w.r.t no of reducers, but how to get the reducers to run in parallel/simultaneosly. thanks ! – Hari Mar 24 '16 at 20:02
  • Framework decides number of reducers and it is 1 in your case. If this number is more than 1, they will run in parallel. If you want to override it, implement custom partitioner and set number of reducers. – Ravindra babu Mar 25 '16 at 02:03
0

With specifying a lower reducer memory of 2 GB, the default in the mapred-site xml was 6GB, the framework brings up 3 reducers in parallel rather than 1.

Hari
  • 75
  • 1
  • 6