5

Suppose I have an enumeration:

public enum SomeEnumType implements Writable {
  A(0), B(1);

  private int value;

  private SomeEnumType(int value) {
    this.value = value;
  }

  @Override
  public void write(final DataOutput dataOutput) throws IOException {
    dataOutput.writeInt(this.value);
  }

  @Override
  public void readFields(final DataInput dataInput) throws IOException {
    this.value = dataInput.readInt();
  }
}

I want to pass an instance of it as a part of some other class instance.

The equals would not work, because it will not consider the inner variable of enumeration, plus all enum instances are fixed at compile time and could not be created elsewhere.

Does it mean I could not send enums over the wire in Hadoop or there's a solution?

starblue
  • 55,348
  • 14
  • 97
  • 151
Artem Oboturov
  • 4,344
  • 2
  • 30
  • 48

3 Answers3

4

My normal and preferred solution for enums in Hadoop is serializing the enums through their ordinal value.

public class EnumWritable implements Writable {

    static enum EnumName {
        ENUM_1, ENUM_2, ENUM_3
    }

    private int enumOrdinal;

    // never forget your default constructor in Hadoop Writables
    public EnumWritable() {
    }

    public EnumWritable(Enum<?> arbitraryEnum) {
        this.enumOrdinal = arbitraryEnum.ordinal();
    }

    public int getEnumOrdinal() {
        return enumOrdinal;
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        enumOrdinal = in.readInt();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeInt(enumOrdinal);
    }

    public static void main(String[] args) {
        // use it like this:
        EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
        // let Hadoop do the write and read stuff
        EnumName yourDeserializedEnum = EnumName.values()[enumWritable.getEnumOrdinal()];
    }

}

Obviously it has drawbacks: Ordinals can change, so if you exchange ENUM_2 with ENUM_3 and read a previously serialized file, this will return the other wrong enum.

So if you know the enum class beforehand, you can write the name of your enum and use it like this:

 enumInstance = EnumName.valueOf(in.readUTF());

This will use slightly more space, but it is more save to changes to your enum names.

The full example would look like this:

public class EnumWritable implements Writable {

    static enum EnumName {
        ENUM_1, ENUM_2, ENUM_3
    }

    private EnumName enumInstance;

    // never forget your default constructor in Hadoop Writables
    public EnumWritable() {
    }

    public EnumWritable(EnumName e) {
        this.enumInstance = e;
    }

    public EnumName getEnum() {
        return enumInstance;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(enumInstance.name());
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        enumInstance = EnumName.valueOf(in.readUTF());
    }

    public static void main(String[] args) {
        // use it like this:
        EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
        // let Hadoop do the write and read stuff
        EnumName yourDeserializedEnum = enumWritable.getEnum();

    }

}
Thomas Jungblut
  • 20,854
  • 6
  • 68
  • 91
  • 2
    One addition to this - if you're using an enum in the Key, **DON'T** use enum.hashCode() as part of your key's hashCode method - the implementation of hashCode for an enum is the default native implementation which is more-or-less the address in memory of the enum type (which is JVM dependent and will give you problems when the same key is output from different mappers - they'll hash to potentially different reducers) – Chris White Oct 09 '12 at 23:02
  • Exactly, that's where you want to use the `Writable` type and implement hashcode and equals based on the ordinal/name of the enum. – Thomas Jungblut Oct 10 '12 at 07:11
  • So, getting back to the question - it's not possible to do for enum. Only via helper classes. – Artem Oboturov Oct 10 '12 at 09:30
1

WritableUtils has convenience methods that make this easier.

WritableUtils.writeEnum(dataOutput,enumData);
enumData = WritableUtils.readEnum(dataInput,MyEnum.class);
aaronman
  • 18,343
  • 7
  • 63
  • 78
0

I don't know anything about Hadoop, but based on the documentation of the interface, you could probably do it like that:

public void readFields(DataInput in) throws IOException {
     // do nothing
}

public static SomeEnumType read(DataInput in) throws IOException {
    int value = in.readInt();
    if (value == 0) {
        return SomeEnumType.A;
    }
    else if (value == 1) {
        return SomeEnumType.B;
    }
    else {
        throw new IOException("Invalid value " + value);
    }
}
JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • While the static method is a good solution for custom writes to arbitrary files, you have to fill the interfaces method, because they are called internally. Writable works like Externalizable in the Java API. – Thomas Jungblut Oct 09 '12 at 12:50