6

I was able to successfully change the wordcount program in hadoop to suit my requirement. However, I have another situation where in I use the same key for 3 values. Let's say my input file is as below.

A Uppercase 1 firstnumber  I  romannumber a lowercase
B Uppercase 2 secondnumber II romannumber b lowercase

Currently in my map/reduce program, I am doing something like below. Here A is the key and 1 is the value.

A 1

I need my map reduce to perform something like below.

A 1 I a 

I can do them in 3 different programs like below and can produce the output.

A 1
A I
A a

However, I want them to do in a single program itself. Basically, from my map function I want to do this.

context.write(key,value1);
context.write(key,value2);
context.write(key,value3);

Is there any way I can do it in the same program rather than writing three different programs?

EDIT:

Let me provide a much more clearer example. I need to do something like below.

A uppercase 1 firstnumber  1.0 floatnumber str stringchecking
A uppercase 2 secondnumber 2.0 floatnumber ing stringchecking

My final output would be,

A 3 3.0 string

3 is the sum of two integers, 3.0 being sum of float numbers and string is the concatenation of two strings.

Ramesh
  • 765
  • 7
  • 24
  • 52
  • What's wrong with doing what you just proposed? You can definitely emit multiple key/value pairs per `map()`. – Mike Park Jun 20 '13 at 16:03
  • Won't it get confused with the values in the reduce function? Won't it mix up the values together and produce some clumsy output? – Ramesh Jun 20 '13 at 16:05
  • Also, what if my formats are different? For example, "a" is a character and "1" is an integer. So, should I set two mapOutputValueclass? – Ramesh Jun 20 '13 at 16:08
  • Is it always going to be 3 values per key? You can create a custom `Writable`, or use an `ArrayWritable` to define a value that is composed of 3 different values. – Mike Park Jun 20 '13 at 16:16
  • Yeah. It will be always 3 values per key. – Ramesh Jun 20 '13 at 16:18
  • What if I need to do some calculations on these 3 values? In that case, how can I have it as a Writable? – Ramesh Jun 20 '13 at 16:22
  • Let me put together an answer and I can edit it as needed. Just one question, will it always be in the order `int,string,string`? – Mike Park Jun 20 '13 at 16:23
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/32089/discussion-between-ramesh-and-climbage) – Ramesh Jun 20 '13 at 16:29

1 Answers1

15

First you'll need a composite writable for all three of your values.

public class CompositeWritable implements Writable {
    int val1 = 0;
    float val2 = 0;
    String val3 = "";

    public CompositeWritable() {}

    public CompositeWritable(int val1, float val2, String val3) {
        this.val1 = val1;
        this.val2 = val2;
        this.val3 = val3;
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        val1 = in.readInt();
        val2 = in.readFloat();
        val3 = WritableUtils.readString(in);
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeInt(val1);
        out.writeFloat(val2);
        WritableUtils.writeString(out, val3);
    }

    public void merge(CompositeWritable other) {
        this.val1 += other.val1;
        this.val2 += other.val2;
        this.val3 += other.val3;
    }

    @Override
    public String toString() {
        return this.val1 + "\t" + this.val2 + "\t" + this.val3;
    }
}

Then in your reduce you'll do something like this...

public void reduce(Text key, Iterable<CompositeWritable> values, Context ctx) throws IOException, InterruptedException{

    CompositeWritable out;

    for (CompositeWritable next : values)
    {
        out.merge(next);
    }

    ctx.write(key, out);
}

Your mapper will simply output one CompositeWritable per map.

I haven't tried to compile this, but the general idea is there.

Mike Park
  • 10,845
  • 2
  • 34
  • 50
  • Just curious, can you use "Text" type for val3 instead of string? – Chaos Jun 20 '13 at 19:37
  • @Chaos I don't see why not. It would just be `val3.readFields(in);` instead of `val3 = WritableUtils.readString(in);`. You can also use `Text.readString(in)` which returns a string. – Mike Park Jun 20 '13 at 19:44
  • Great!, so DataInput & DataOutput only read/write integers & floats? – Chaos Jun 20 '13 at 19:57
  • @Chaos Yes primitive types. You can read/write byte arrays which is how strings are stored. They are length encoded with the first 4 bytes (int) describing the length of the string and the number of bytes of the stream to read. – Mike Park Jun 20 '13 at 20:05
  • @climbage could you please help in writing mapper for this helpful piece of code suggested by you. How would this output one CompositeWritable per map? I'm using something like `context.write(new Text(line[0]), new CustomWritable(Integer.parseInt(line[2]),Float.parseFloat(line[4]),line[6]));` in the `Mapper` but it seems to be incorrect as mapper would output data in the format `K,V1 V2 V3` in this case which would disallow reducer to handle such values. Please help. – prashant1988 Apr 07 '15 at 19:38
  • @prashant1988 The whole point of the `CompositeWritable` is so you can represent multiple values as a single value. What do you mean by *seems*? Did you try it? – Mike Park Apr 07 '15 at 20:42
  • @climbage Yes I tried running a MRunit test for the mapper. It gives output in the format `K,V1 V2 V3`. Please assist. Thanks – prashant1988 Apr 08 '15 at 06:43
  • what do you want it to do? – Mike Park Apr 08 '15 at 15:15