3

I am using Hadoop for a university assignment and I have the code working however im running into a small issue.

I am trying to set the number of reducers to 19 ( which is 0.95 * capacity as the docs suggest). However when I view my job in the task tracker it says 1 reducer in total.

System.err.println("here");
job.setNumReduceTasks(19);
System.err.println(job.getNumReduceTasks());

Yields as expected:

here
19

But in the final output I get:

12/05/16 11:10:54 INFO mapred.JobClient:     Data-local map tasks=111
12/05/16 11:10:54 INFO mapred.JobClient:     Rack-local map tasks=58
12/05/16 11:10:54 INFO mapred.JobClient:     Launched map tasks=169
12/05/16 11:10:54 INFO mapred.JobClient:     Launched reduce tasks=1

The parts of the mapreduce I have overwritten are:

  • Mapper
  • Reducer
  • Partitioner
  • Grouping Comparator.

My first thought was that the partitioner was returning the same value for every key. I check this and it was not the case.

I have also checked that the grouper works correctly.

I am not sure what else could be causing this. If anyone could help it would be much appreciated.

I am very much an anti Java person so please try and use very explicit examples if you could.

PS: I did not set this cluster up it was setup by the university so I am unsure of any configuration variables. PS: There was too much code to post so please let me know any code in particular you would like to see.

Edit: I was asked the following questions by TejasP:

Are you really running the code on Hadoop or its in local mode ? (see if your jobs are seen on the jobtracker and tasktracker).

Yes I am, It is viewable in the jobtracker UI. This also reports 1 reducer. As well as having Note: This has the reducers listed as 1 in the settings.xml

Have you exported HADOOP variables in the environment ?

Yes and they are visible in env and the code does not compile until I have set them.

env | grep HADOOP
HADOOP_HOME=/mnt/biginsights/opt/ibm/biginsights/IHC
HADOOP_CONF_DIR=/mnt/biginsights/opt/ibm/biginsights/hadoop-conf

Is the cluster single node or multiple node ? AND Even if the cluster is of multiple nodes, are all the nodes healthy ? Is there issue with the other nodes ?

Yes there are multiple nodes (10) Job tracker Reports:

Nodes: 10
Map Task Capacity: 20
Reduce Task Capacity: 20
Blacklisted Nodes: 0

Are you using setNumReduceTasks correctly? As stated above I have called set and then get and gotten back the value that it was ment to be (19) but the final code still only used 1.

You can reduce your code to a small map-reduce code by removing details (this is just for ?debugging). Run it. See what happens. Facing same issue, provide the reduced code in the original question.

I will try and edit again with the results

Aviram Segal
  • 10,962
  • 3
  • 39
  • 52
Nick
  • 900
  • 1
  • 10
  • 19
  • You are able to see what is in the configuration XML in your job. (the blue link near to "Job File" in your job view in web frontend). What value is mapped for key "mapred.reduce.tasks"? – Thomas Jungblut May 16 '12 at 12:17
  • The value is, mapred.reduce.tasks: 1 What is setting this number? – Nick May 16 '12 at 12:26
  • I believe that this is a bug. You can set the value directly via your configuration, I guess this is job.set("mapred.reduce.tasks","19");. Actually the method should do this correctly. – Thomas Jungblut May 16 '12 at 12:28
  • I set this variable (had to set it on the Configuration object not the job though) but it did not change the value in the job.xml. which is strange. – Nick May 16 '12 at 12:45
  • Well then you are in trouble ;) Do you know if the Hadoop of IBM BigInsights has a bug (assuming that you are using it, based on your paths)? can you post the code where you are setting up the job please? – Thomas Jungblut May 16 '12 at 12:46
  • 1
    Sigh I just found a line in my code (which i copied from an example) that set the reducer count to 1 much further down the code. Really sorry to have wasted your time with this simple oversight. Your help is much appreciated! :) – Nick May 16 '12 at 12:59

2 Answers2

3

It looks like you are running it in LocalJobRunner mode (most likely from eclipse). In this mode, if the number of reduce tasks is > 1, it resets the number to 1. Take a look at the following few lines from LocalJobRunner.java

int numReduceTasks = job.getNumReduceTasks();
if (numReduceTasks > 1 || numReduceTasks < 0) {
      // we only allow 0 or 1 reducer in local mode
      numReduceTasks = 1;
      job.setNumReduceTasks(1);
}
  • Thank you for your answer. I hope it is helpful for others, but my problem was caused because I was retarded... please see my comment on the OP. – Nick Aug 29 '12 at 10:58
1

Few points that you need to consider:

  1. Are you really running the code on Hadoop or its in local mode ? (see if your jobs are seen on the jobtracker and tasktracker)
  2. Have you exported HADOOP variables in the environment ?
  3. Is the cluster single node or multiple node ?
  4. Even if the cluster is of multiple nodes, are all the nodes healthy ? Is there issue with the other nodes ?
  5. Are you using setNumReduceTasks correctly ? You can reduce your code to a small map-reduce code by removing details (this is just for debugging). Run it. See what happens. Facing same issue, provide the reduced code in the original question.
Tejas Patil
  • 6,149
  • 1
  • 23
  • 38
  • Thanks I have edited my original question to include the answers to those questions (except the reduced code which im working on) – Nick May 16 '12 at 12:08