0

I am having this particular requirement where a method has to be identified by different regular expressions for different components. For example, there need to be a regex for return parameter, one for method name, one for argument type and one for argument name. I was able to come up with an expression till this step as follows -

([^,]+) ([^,]+)\((([^,]+) ([^,]+))\)

It works well for a method signature like -

ReturnType foo(Arg parameter) The regular expression identifies ReturnType, foo, Arg and parameter separately.

Now the problem is that a method can have no/one/multiple arguments separated by commas. I am not able to get a repeating expression for this. Help will be appreciated.

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
nitesh
  • 91
  • 1
  • 7
  • 4
    Regular expressions are not powerful enough to match a Java method definition. You will need a parser instead. – JaredPar Jun 09 '10 at 15:13
  • 1
    @Jared, are you sure the Method definitions aren't regular? – jjnguy Jun 09 '10 at 15:16
  • Even if you can build a regex for this, it's going to be horribly ugly. Think about newlines, whitespace, comments, etc. - all of which can be present between the arguments and types, among others. – Rob Hruska Jun 09 '10 at 15:18
  • @Jjnguy, Yes because signatures contain types and types themselves are not regular if you consider nested generics. Nested generics require a matching set of `<>`'s and once you have matching + nested you have counting and this puts it into the class of a context free grammar – JaredPar Jun 09 '10 at 15:32
  • @Jared, that explanation makes great sense. Thanks. – jjnguy Jun 09 '10 at 15:42
  • @JaredPar I agree with your comments and the suggestion for parsing. At present, as I mentioned in question, I needed solution for simple form of java method signature like ReturnType foo(Arg parameter) in one line with only "Arg parameter" reoccurring and therefore I believe RegEx can be generated for such method. – nitesh Jun 11 '10 at 15:25
  • The same question answered elsewhere: http://stackoverflow.com/questions/68633/regex-that-will-match-a-java-method-declaration – aliteralmind Sep 26 '13 at 15:09

2 Answers2

1

If you choose to go down the road of using regex/String manipulation, you could pull out the entire argument string, split it on commas and split the resulting strings on white space.

Though I would agree with JaredPar's comment on your question, at least if you expect to be able to handle all the possibilities that are valid in a java api.

For example there are a series of keywords that can prefix your method (public/private, static, final). There is also the possibility of annotations on either the method or the parameters. Something as simple as using a tab or newline between the return value instead of a space will break your current regex.

Good Luck

Angelo Genovese
  • 3,398
  • 17
  • 23
1

Let's abstract this out a bit, and say we want to match a (possibly empty) list of digits separated by commas.

(empty)
12
12,34
12,34,56

The pattern is therefore

^$|^\d+(,\d+)*$

Now you can try to replace the components to match what you want:

  • Instead of \d+, whatever regex you use to match type name and identifier
  • Maybe allow \s* around the comma
  • Maybe you'd even add the special varargs last argument (which can also be the first and only)

Note that if you allow generic type parameters, then you definitely can't use regex since you can nest the <...> and the language of balanced balanced parentheses of arbitrary depth is not regular.

Although you can argue that in practice, no one would ever nest type parameters deeper than, say, 3 levels, so then it becomes regular again.

That said, a proper parser is really the best tool for this. Just look for implementation of Java grammar, say, in ANTLR.


See also

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • Thank you polygenelubricants for the pointers I was able to generate expected output for mentioned problem on the similar lines. – nitesh Jun 11 '10 at 16:03