2

I have a class that looks like this...

public class LegionInputFormat
        extends FileInputFormat<NullWritable, LegionRecord> {

    @Override
    public RecordReader<NullWritable, LegionRecord>
            createRecordReader(InputSplit split, TaskAttemptContext context) {

        /* Skipped code for getting recordDelimiterBytes */

        return new LegionRecordReader(recordDelimiterBytes);
    }
}

I'd like to use a generic type so it could return any type of RecordReader specified by the user, like so:

public class LegionInputFormat<T extends RecordReader<NullWritable, LegionRecord>>
        extends FileInputFormat<NullWritable, LegionRecord> {

    @Override
    public RecordReader<NullWritable, LegionRecord>
            createRecordReader(InputSplit split, TaskAttemptContext context) {

        /* Skipped code for getting recordDelimiterBytes */

        return new T(recordDelimiterBytes);
    }
}

As the post title suggests, I'm being told I "cannot instantiate the Type T." From other Stack Exchange posts, I've gathered that this is not possible due to something with how generics work. What I've not been able to gather is an intuitive explanation of why that's the case. I learn best by understanding, so that would be really helpful if somebody can offer it.

I'm also interested in the best practice for accomplishing what I'm looking to do here. Should the constructor for LegionInputFormat accept a RecordReader class, store that, and then reference it later to create a new instance? Or is there a better solution?

(Additional background - context here is Hadoop, but I doubt it matters. I'm a fairly accomplished Data Scientist, but I'm pretty new to Java.)

John Chrysostom
  • 3,973
  • 1
  • 34
  • 50
  • [Type Erasure](https://docs.oracle.com/javase/tutorial/java/generics/erasure.html). – Elliott Frisch Dec 13 '16 at 13:01
  • I'm not sure why that prevents this, though? So, at compile time, `T` gets replaced by `LegionRecordReader` or any other `RecordReader` class I specify... Why does that prevent `return new T` from becoming `return new LegionRecordReader` at compile time? – John Chrysostom Dec 13 '16 at 13:03
  • 1
    At compile time, `T` gets replaced by `java.lang.Object`. **Not** `LegionRecordReader`. – Elliott Frisch Dec 13 '16 at 13:21
  • 3
    The compiler has no way of knowing whether `T` has a constructor that accepts `recordDelimiterBytes` as argument. – Ole V.V. Dec 13 '16 at 13:27
  • I have once or twice solved a similar problem with an abstract factory method that would create the `T` for me. It just requires the users of the class to make a little specialization implementing the factory method with that line, `return new LegionRecordReader(recordDelimiterBytes);`. – Ole V.V. Dec 13 '16 at 13:29
  • Possible duplicate of [Instantiating generics type in java](http://stackoverflow.com/questions/2434041/instantiating-generics-type-in-java). – Ole V.V. Dec 13 '16 at 13:31

2 Answers2

1

As the post title suggests, I'm being told I "cannot instantiate the Type T." From other Stack Exchange posts, I've gathered that this is not possible due to something with how generics work.

This is because generics in Java are purely a compile-time feature; the compiler throws away the generics (this is called "type erasure") so that at runtime, there is no such thing as a type variable T, so you cannot do new T(...).

You can do this in Java by passing a Class<T> object to the method that needs to create an instance of T, and then creating an instance through reflection.

Jesper
  • 202,709
  • 46
  • 318
  • 350
  • "there is no such thing as the type variable T at runtime." That really helps the type erasure explanation click for me. If T is being replaced by `XRecordReader` at compile time, why isn't it also replaced in `return new T` at compile time? – John Chrysostom Dec 13 '16 at 13:05
  • 1
    It's not being replaced by `XRecordReader` at compile time, but by `Object`. For example, a `List` just looks like a raw `List` that contains any kind of `Object` at runtime. – Jesper Dec 13 '16 at 13:08
  • 1
    @JohnChrysostom That would mean a separate version of the method would have to be made for every `T` that's used with it, i.e. it becomes a template. With erasure, you only need 1 version of the method. It's simply a design choice, that also happens to be backwards compatible. – Jorn Vernee Dec 13 '16 at 13:10
0

In your second code example the compiler has no way of knowing whether T has a constructor that accepts recordDelimiterBytes as argument. This is so because each class is a separate compilation unit, so when LegionInputFormat is compiled, the compiler only knows that T is a RecordReader<NullWritable, LegionRecord>. It doesn’t know which concrete types are used for T, and it has to assume that someone at a later time can stick in any class that extends RecordReader<NullWritable, LegionRecord>. We can tell the compiler something about T using extends, but there is no way in Java that we can specify that T has a constructor T(byte[]) (or whatever the type of recordDelimiterBytes is).

I have used the following solution a couple of times, and even though it requires creating subclasses, I am quite happy with it. The work is still within the generic class. It is now declared abstract:

public abstract class InputFormat<T extends RecordReader<NullWritable, LegionRecord>>
        extends FileInputFormat<NullWritable, LegionRecord> {

    private byte[] recordDelimiterBytes;

    @Override
    public RecordReader<NullWritable, LegionRecord> createRecordReader(InputSplit split, TaskAttemptContext context) {

        /* Skipped code for getting recordDelimiterBytes */

        return constructRecordReader(recordDelimiterBytes);
    }

    // factory method for T objects
    protected abstract RecordReader<NullWritable, LegionRecord> constructRecordReader(byte[] recordDelimiterBytes);
}

For instantiation it requires you write a concrete subclass with just these few lines:

public class LegionInputFormat extends InputFormat<LegionRecordReader> {

    @Override
    protected RecordReader<NullWritable, LegionRecord> constructRecordReader(byte[] recordDelimiterBytes) {
        return new LegionRecordReader(recordDelimiterBytes);
    }

}

In the subclass we know the concrete class of T and therefore the class’ constructor/s, so we can instantiate it. Though not quite as simple as you might have hoped for, I consider the solution nice and clean.

In my own code I have taken the opportunity to declare the factory method as returning type T:

protected abstract T constructRecordReader(byte[] recordDelimiterBytes);

Then you just have to follow up in the implementation:

protected LegionRecordReader constructRecordReader(byte[] recordDelimiterBytes) {

In this case it even gets a few chars shorter. On the other hand, in your case you don’t seem to need it, so you may prefer to stay with the weaker return type of RecordReader<NullWritable, LegionRecord>.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161