3

I'm trying to write an utility method which allows me to get the definition of a field, including all generic arguments. To do so, I retrieve the generic type of a field via Field.getGenericType() and parse the type name which has the following syntax (in EBNF):

generictype = classname [ '<' generictype { ',' ' ' generictype } '>' ]
classname   = package '.' ( letter | '_' ) { letter | digit | '_' }
package     = ( packagepart { '.' packagepart } ) |
packagepart = ( letter | '_' ) { letter | digit | '_' }

My first attempt to parse this was by using the regular expression

(?<argument>\w+(?:\.\w+)*(?:\<(?<arglist>\g<argument>(?:,\s\g<argument>)*)\>)?)

whose details can be inspected here. This regular expression is just what I need. Now, Java regex does not support the \g<name> construct, so I can't use this approach if I want to support generic types with an unknown depth in their arguments.

Is there any other approach that I can use? If yes, how can I achieve what I'm trying to do?


EDIT: The reason I want to achieve this is because I have a configuration and want to transfer its contents into the respective fields of an object. Some sort of deserialization, if you want to call it that way. Now, the configuration only supports primitive types, java.lang.String, java.util.List<T>, java.util.Map<K, V> and java.util.Map.Entry<K, V>. To retrieve values of those classes, the client has to provide a class as parameter which will be used to deserialize the strings that are saved in the configuration. Because of that, I have to determine which generic parameters a field of a class used and also which Classes they correspond to.

mezzodrinker
  • 998
  • 10
  • 28
  • 3
    Java's compiler already does a pretty good job of parsing Java syntax - why reinvent the wheel? – Boris the Spider Mar 14 '16 at 22:11
  • @BoristheSpider If you could also tell me how I get the Java compiler to give me the arguments of a generic type, then there'd be no problem in using it at all. – mezzodrinker Mar 14 '16 at 22:13
  • I would [start here](https://www.javacodegeeks.com/2015/09/java-compiler-api.html). There is a whole API for examining the AST generated by the Java compiler... – Boris the Spider Mar 14 '16 at 22:17
  • It's really confusing what you are trying to do: Do you want to parse this at runtime? Why are you parsing it with regexp instead of reflection api completely? – highstakes Mar 14 '16 at 22:19
  • @highstakes any number of reasons; static code analysis? Code generation? Higher order programming? – Boris the Spider Mar 14 '16 at 22:20
  • @BoristheSpider Now what do I do if I do not have the source code of the class that I'd like to inspect or do not know where that code is located at? – mezzodrinker Mar 14 '16 at 22:21
  • Sorry what? You want to analyse the source of a file that may or may not exist in some unknown location? I would suggest you consult something like [this](http://www.prezzybox.com/magic-wand-note-book-2.aspx?gclid=Cj0KEQjwwpm3BRDuh5awn4qJpLwBEiQAATTAQYAfuapow39nLmBAhzF-pfuBtpBxmRbkvcuKcGYWsjEaAi518P8HAQ). – Boris the Spider Mar 14 '16 at 22:22
  • @highstakes Yes, I want to parse it at runtime. And I have to use something different than the reflection API because java.lang.reflect only gives me a String describing the arguments but doesn't let me get the class itself. – mezzodrinker Mar 14 '16 at 22:23
  • @BoristheSpider Apparently, my field of use isn't quite clear, I'm going to edit the question. – mezzodrinker Mar 14 '16 at 22:24
  • "...I do not have the source code" Then how are you going to get the generic type? The compiled code has type-erased it. So you need the source code by absence of choice. I agree, you want to use a parser that already knows how to do this, unless you want to reinvent all the syntax that goes with what can say in a type. The JavaC compiler is certainly one choice. Other choices include tools that parse Java source code; consider program transformation systems, which are designed to parse code and transform it to something different, which is actually what you want to do. Check my bio. – Ira Baxter Mar 14 '16 at 22:47
  • @IraBaxter See my [answer](http://stackoverflow.com/a/35999518/1898236) below. Java provides a `getGenericType()` method for `Field`, which is exactly what I'm using and working with. – mezzodrinker Mar 14 '16 at 22:48
  • 1
    @IraBaxter the generics are present in the bytecode, just not used by the JVM. You can in fact [read them via reflection](http://blog.xebia.com/acessing-generic-types-at-runtime-in-java/). Yes, Java generics are type-erased, but the declarations themselves are not reified. – Boris the Spider Mar 14 '16 at 22:49
  • This is a context-free grammar, which cannot be parsed by regular expressions. Have a look at both these concepts https://en.wikipedia.org/wiki/Regular_language and https://en.wikipedia.org/wiki/Context-free_language. – Sci Prog Mar 15 '16 at 00:06
  • `generictype` recurses to `generictype`. You can't parse that with a regular expression. – erickson Mar 15 '16 at 00:13
  • @erickson I know, that is why I asked for a different approach. On a related note: You **can** recurse a regex subpattern but that depends on the implementation. For example, java regex can't, but Perl can. – mezzodrinker Mar 15 '16 at 00:41

3 Answers3

1

You can do the following:

Type type = Field.getGenericType();
if (type instanceof ParameterizedType) {
    ParameterizedType pt = (ParameterizedType) type;
    Class<?> genericType  = (Class<?>) pt.getActualTypeArguments()[0];
}

If this is not enough, just use the google reflections library : https://github.com/google/guava/wiki/ReflectionExplained

highstakes
  • 1,499
  • 9
  • 14
  • The code you posted doesn't work because the values returned by `ParameterizedType.getActualTypeArguments()` are not of type `Class` and thus result in a `ClassCastException`. I didn't try Guava, though. – mezzodrinker Mar 14 '16 at 23:18
  • Well, you gotta continue the recursion if you have embedded ParameterizedType inside ParameterizedType, check the instance again and again until you get a class. – highstakes Mar 14 '16 at 23:26
1

If you really need to parse (highstakes' approach looks more elegant, but parsing is what the question asks for), I'd do this with recursive descent parsing, something like this:

class GenericType {
  String baseName;
  List<GenericType> params;

  GenericType(String baseName, List<GenericType> params) {
    this.baseName = baseName;
    this.params = params;
  }

  static GenericType parse(String s) {
    StreamTokenizer tokenizer = new StreamTokenizer(new StringReader(s));
    tokenizer.wordChars('.', '.');  // Make dots part of the name
    try {
      tokenizer.nextToken();  // Skip "BOF" token
      return parse(tokenizer);
    } catch (IOException e) {
      throw new RuntimeException();
    }
  }

  static GenericType parse(StreamTokenizer tokenizer) throws IOException {
    String baseName = tokenizer.sval;
    tokenizer.nextToken();
    List<GenericType> params = new ArrayList<>();
    if (tokenizer.ttype == '<') {
      do {
        tokenizer.nextToken();  // Skip '<' or ','
        params.add(parse(tokenizer));
      } while (tokenizer.ttype == ',');
      tokenizer.nextToken();  // skip '>'
    }
    return new GenericType(baseName, params);
  }
}
Stefan Haustein
  • 18,427
  • 3
  • 36
  • 51
  • I don't see how `parse(String)` is supposed to work. It creates a new `StreamTokenizer` but never assigns any contents to it. I suppose you wanted to write `StreamTokenizer tokenizer = new StreamTokenizer(new StringReader(s));`? – mezzodrinker Mar 15 '16 at 17:17
  • @mezzodrinker Thanks, fixed (including some other typos). Note that this actually answers the original question in a concise manner -- I agree that for your actual use case, the accepted answer is a better solution, but that may or may not be the case for people searching for a solution to a similar problem. – Stefan Haustein Mar 15 '16 at 21:39
  • I was just waiting for some (more or less) obvious issues to be fixed :) – mezzodrinker Mar 15 '16 at 22:10
0

I found a solution here, at a question which is trying to do exactly the same, the only difference is that it's in C#. Due to that difference, I had to rewrite the code by Erik_at_Digit a little bit and eventually ended up with this solution:

public class ClassUtil {
    // https://stackoverflow.com/questions/20532691/how-to-parse-c-sharp-generic-type-names?rq=1
    static List<String> splitByComma(String typeArgumentList) {
        List<String> strings = new LinkedList<>();
        StringBuilder sb = new StringBuilder();
        int level = 0;

        for (int i = 0; i < typeArgumentList.length(); i++) {
            char c = typeArgumentList.charAt(i);
            if (c == ',' && level == 0) {
                strings.add(sb.toString());
                sb.setLength(0);
            } else {
                sb.append(c);
            }

            if (c == '<') {
                level++;
            }
            if (c == '>') {
                level--;
            }
        }

        strings.add(sb.toString());

        return strings;
    }

    static GenericType getGenericType(String description) throws ClassNotFoundException {
        Type type;
        GenericType[] parameters;
        if (!description.contains("<")) {
            type = Class.forName(description);
            parameters = new GenericType[0];
        } else {
            int start = description.indexOf('<');
            int end = description.lastIndexOf('>');
            String typeArgumentList = description.substring(start + 1, end);
            String name = description.substring(0, start);
            List<String> arguments = splitByComma(typeArgumentList);

            type = Class.forName(name);
            parameters = new GenericType[arguments.size()];

            for (int i = 0; i < arguments.size(); i++) {
                String argument = arguments.get(i).trim();
                parameters[i] = getGenericType(argument);
            }
        }

        return new GenericType(type, parameters);
    }

    public static GenericType getGenericType(Type type) throws ClassNotFoundException {
        String description = type.getTypeName();
        if (!description.contains("<")) return new GenericType(type);
        return getGenericType(description);
    }

    static List<Map<List<String>, Object>> field = new LinkedList<>();

    public static void main(String[] args) throws Throwable {
        Field field = ClassUtil.class.getDeclaredField("field");
        System.out.println(field.getGenericType());
        System.out.println(getGenericType(field.getGenericType()));
    }

    static class GenericType implements Type {
        final String        typeName;
        final Type          type;
        final GenericType[] parameters;

        public GenericType(Type type, GenericType... parameters) {
            this.type = type;
            this.parameters = parameters;
            typeName = buildTypeName(type, parameters);
        }

        private static String buildTypeName(Type type, GenericType... parameters) {
            StringBuilder s = new StringBuilder();
            s.append(type.getTypeName());

            if (parameters.length > 0) {
                CharSequence[] names = new CharSequence[parameters.length];
                for (int i = 0; i < parameters.length; i++) {
                    names[i] = parameters[i].getTypeName();
                }
                s.append("<").append(String.join(", ", names)).append(">");
            }

            return s.toString();
        }

        @Override
        public String toString() {
            return getTypeName();
        }

        @Override
        public String getTypeName() {
            return typeName;
        }
    }
}

This surely isn't the solution with the best performance or anything but it does what it was designed to do.

Community
  • 1
  • 1
mezzodrinker
  • 998
  • 10
  • 28