2

I want/need to pass along the rowkey to the Reducer, as the rowkey is calculated in advance, and the information is not available anymore at that stage. (The Reducer executes a Put)

First I tried to just use inner classes, e.g.

public class MRMine {
  private byte[] rowkey;
  public void start(Configuration c, Date d) {
    // calc rowkey based on date
    TableMapReduceUtil.initTableMapperJob(...);
    TableMapReduceUtil.initTableReducerJob(...);
  }
  public class MyMapper extends TableMapper<Text, IntWritable> {...}
  public class MyReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> {...}
}

and both MyMapper and MyReducer have the default constructor defined. But this approach leads to the following exception(s):

java.lang.RuntimeException: java.lang.NoSuchMethodException: com.mycompany.MRMine$MyMapper.<init>()
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.NoSuchMethodException: com.company.MRMine$MyMapper.<init>()
    at java.lang.Class.getConstructor0(Class.java:2730)
    at java.lang.Class.getDeclaredConstructor(Class.java:2004)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)

I got rid of the exception by declaring the inner classes static (Runtimeexception: java.lang.NoSuchMethodException: tfidf$Reduce.<init>()) . but then I'd have to make the rowkey static as well, and I'm running multiple jobs in parallel.

I found https://stackoverflow.com/a/6739905/1338732 where the configure method of the Reducer is overwritten, but it doesn't seem to be available anymore. Anyhow, I wouldn't be able to pass along a value.

I was thinking of (mis)using (?) the Configuration, by just adding a new key-value pair, would this be working, and the correct approach?

Is there a way to pass along any custom value to the reducer?

the versions I'm using are: hbase: 0.94.6.1, hadoop: 1.0.4

Community
  • 1
  • 1
divadpoc
  • 903
  • 10
  • 31

2 Answers2

2

Your problem statement is a little unclear, however I think something like this is what you are looking for.

The way I currently use to pass information to the reducer is to pass it in the configuration.

in the job setup do the following

conf.set("someName","someValue");

This will create a tag in the configuration that has name someName with value someValue. This can later be retrieved in the Mapper/Reducer by doing the following:

Configuration conf = context.getConfiguration();
String someVariable = conf.get("someName");

The current code will set the value of someVariable to "someValue", allowing the information to be passed to the reducer.

To pass multiple values use setStrings(). I haven't tested this function yet, but according to the documentation is should work with one of the following two options (the documentation is a little unclear, so try both and use whichever works):

conf.setStrings("someName","value1,value2,value3");
conf.setStrings("someName","value1","value2","value3");

retrieve using:

Configuration conf = context.getConfiguration();
String someVariable = conf.getStrings("someName");

Hope this helps

Davis Broda
  • 4,102
  • 5
  • 23
  • 37
  • thanks, that works. i'll take it as the "way to go" in order to pass along values – divadpoc Oct 18 '13 at 15:05
  • yes, this is the typical way to pass small configuration values to mappers or reducers. If you need to pass large amounts of data, using the distributed cache is preferred. – David Oct 18 '13 at 15:44
0

The goal is a little unclear, but I have found that for many types of jobs involving HBase, you do not need a reducer to put data into HBase. The mapper reads a row, modifies it in some way, then writes it back.

Obviously there are jobs for which that is inappropriate (any type of aggregation for example), but the reduce stage can really slow down a job.

David
  • 3,251
  • 18
  • 28
  • the goal is to pass along information to the `Reducer` (I need it as I aggregate data). but thanks for the information, i'll keep it in mind if I come upon a use case where no reducer is needed – divadpoc Oct 19 '13 at 17:45