0

I am trying to get the number of input records lines in mapper by running

job.getCounters().findCounter(TaskCounter.MAP_INPUT_RECORDS).getValue()

Actually, it works after the job is completed and I want to do the same thing in the setup phase of the reducer. I have tried to overwrite 2 setup functions according to this page Accessing a mapper's counter from a reducer. However, neither of them work and both of them return null pointer exception.

    @Override
    protected void setup(Reducer<Text, DoubleWritable, Text, DoubleWritable>.Context context)
            throws IOException, InterruptedException {
        Configuration conf = context.getConfiguration();
        Cluster cluster = new Cluster(conf);
        //RunningJob job = (RunningJob) cluster.getJob(context.getJobID());
        Job job = cluster.getJob(context.getJobID());
        System.out.println(job.getCounters().findCounter(TaskCounter.MAP_INPUT_RECORDS).getValue());
    }

    @Override
    protected void setup(Reducer<Text, DoubleWritable, Text, DoubleWritable>.Context context)
            throws IOException, InterruptedException {
        Configuration conf = context.getConfiguration();
        JobClient client = new JobClient(conf);
        RunningJob job = client.getJob(JobID.forName(conf.get("mapred.map.id")));
        //Job job = (Job) client.getJob((JobID) context.getJobID());
        System.out.println(job.getCounters().findCounter(TaskCounter.MAP_INPUT_RECORDS).getValue());

    }

Can anyone help me fix the problem? Thanks a lot.

  • The reducer would get a subsection of the data read by the entire map-phase (one key per reducer) so why would you need this? – OneCricketeer Aug 31 '18 at 02:59
  • Actually, this is for a TFIDF calculation job. The MAP_INPUT_RECORDS is the number of docs in the entire collection. I want to get that number so that I can do my calculation in my reducer. :) – Jintao Wang Aug 31 '18 at 03:03
  • So, is the error from `job.getCounters()` being null? As far as I know, `cluster.getJob(context.getJobID())` should return you the right information – OneCricketeer Aug 31 '18 at 03:04
  • Yes, cluster.getJob(context.getJobID()) works and the error is from job.getCounters(....). – Jintao Wang Aug 31 '18 at 03:06
  • Can you print the counters object or loop over what data *is* within it? – OneCricketeer Aug 31 '18 at 03:11
  • Well if I do Counters counters = job.getCounters() , here is where NullPointerException comes from. So I got nothing in counters. And I try to use RunningJob Class to declare the job, it does not work either. – Jintao Wang Aug 31 '18 at 03:33
  • `conf.get("mapred.map.id")` shouldn't work. That property is deprecated... Getting the job ID from the context is correct, though, if you look at the JavaDoc, it says getting the counters, *May return null if the job has been retired and the job is no longer in the completed job store* – OneCricketeer Aug 31 '18 at 05:34

0 Answers0