MapReduce output as ArrayList

Question

How to call the map reduce method in normal java project and is it possible to return the reducer output as an Arraylist / Hashmap instead of a flat file, and how to Access the mapreduce method from jboss appServer.

Have a look how Apache MRUnit does this, you can actually use it for your requirements. — Thomas Jungblut, Jul 01 '13 at 10:05
@ThomasJungblut, i went through ur `http://stackoverflow.com/questions/9849776/calling-a-mapreduce-job-from-a-simple-java-program` to invoke the mapreduce method from remote server, it went fine, but how to get the output of mapreduce in remote machine which invoke the MP? — Jeevanantham, Jul 01 '13 at 12:30
@ThomasJungblut, can you share any sample code on "MultipleOutput" process to save multiple files — Jeevanantham, Jul 02 '13 at 05:40

score 0 · Answer 1 · answered Jul 03 '13 at 07:00

Here is a sample program that uses MultipleOutput

    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int total = 0;
          for (; values.hasNext();) {
            total += values.next().get();
            mos.getCollector("text", reporter).collect(key,
                    new IntWritable(total));
            mos.getCollector("seq", reporter).collect(key,
                    new IntWritable(total));
        }

    }

You would need to create an MultipleOutputs instance in the configure method.

    private MultipleOutputs mos;

    @Override
    public void configure(JobConf job) {

        mos = new MultipleOutputs(job);
    }

In your driver class you need to tell which all inputformats you want to use. Below will generate your output in Text and Sequence file formats.

// Defines additional single text based output 'text' for the job
    MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
            Text.class, IntWritable.class);

    // Defines additional sequence-file based output 'sequence' for the job
    MultipleOutputs.addNamedOutput(conf, "seq",
            SequenceFileOutputFormat.class, Text.class, IntWritable.class);

But from what I understood from your question, you basically want to access your mapreduce output from your code. You may download the output file using HDFS API. But better would be to put your data in a Hive table and access using JDBC.

I am using Hadoop-0.20.2 version, in which more classes like JobConf class etc., are deprecated, can u plz suggest a stable version of Hadoop which provide the MultipleOutputs feature too. thanks — Jeevanantham, Jul 04 '13 at 03:57
You may use the new MR APIs. This is use org.apache.hadoop.mapreduce.lib.output.MultipleOutputs instead of org.apache.hadoop.mapred.lib.MultipleOutputs. — Arijit Banerjee, Jul 04 '13 at 08:40

MapReduce output as ArrayList

1 Answers1