3

Regular expressions are a weakness of mine.

I am looking for a regex or other technique that will allow me to read an arbitrary string and determine if it is a valid java function.

Good:

public void foo()
void foo()
static protected List foo()
static List foo()

Bad:

public List myList = new List()

Code:

For String line : lines.
{    
     If(line.matches("(public|protected|private)*(/w)*(")
}

Is there such a regex that will return true if it's a valid java function?

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
Woot4Moo
  • 23,987
  • 16
  • 94
  • 151
  • 1
    Could you look for a ; at the end of the line? You may have to look at rows around it as well – RNJ Sep 12 '12 at 16:20
  • Im on a phone right now so i dont have the source in front of me. Ill try to recall from memory in an update – Woot4Moo Sep 12 '12 at 16:20
  • Something like the above as i recall – Woot4Moo Sep 12 '12 at 16:25
  • The general case of "any valid java function" most probably cannot be done with regexp - it requires proper parser. Do you have any limits on what needs to be recognized? –  Sep 12 '12 at 16:29
  • look at javacc grammar files http://javacc.java.net/doc/javaccgrm.html – aviad Sep 12 '12 at 16:29
  • 1
    Do you need to be able to detect methods with generic parameters or return types? If so then you will not be able to use a regex as regexes cannot be used to describe a context free grammar, which is needed to describe generics (that is, the generic parameter of a type may be itself be a generic type and so on). – Dunes Sep 12 '12 at 16:38
  • @Dunes no i do not care about types. Just that the placement of parameters is valid. – Woot4Moo Sep 12 '12 at 16:43
  • Does this help: http://stackoverflow.com/questions/68633/regex-that-will-match-a-java-method-declaration – Kev Sep 14 '12 at 00:06

2 Answers2

6
/^\s*(public|private|protected)?\s+(static)?\s+\w+\s+\w+\s*\(.*?\)\s*$/m

Matches:

  • Start of line <^>
  • Arbitrary White space <\s*>
  • Optional scope <(public|private|protected)?>
  • At least one space <\s+>
  • Optional keyword static <(static)?>
  • At least one space <\s+>
  • A java identifier (which you should hope is a class name or literal) <\w+>
  • At least one space <\s+>
  • A java identifier (the function name) <\w+>
  • Open paren <(>
  • arbitrary arguments (no checking done here, because of the massive mess) <.*?>
    • The does lazy matching
  • Close paren <)>
  • arbitrary whitespace <\s*>
  • End of line

This is FAR from complete, but ought to suit your needs.

FrankieTheKneeMan
  • 6,645
  • 2
  • 26
  • 37
  • The java code conventions call for using one of the access modifiers (`public`, `private` or `protected`) before using the `static` indicator. Of course, you *may* code them the other way around. – Maarten Bodewes Sep 12 '12 at 20:45
3

Depends how rigorous you need it to be, because it can get fairly complex as a regex.

The grammar for method declarations in Java is something like the following:

Java method declaration BNF:

method_declaration 
    ::= 
    { modifier } type identifier 
    "(" [ parameter_list ] ")" { "[" "]" } 
    ( statement_block | ";" ) 

and you have to check things like having multiple modifiers but not the same modifier repeated or multiple scope modifiers, also other things like the type and identifier isn't one of the Java keywords. Starts getting hairy... I doubt you'd want to write your own Java parser.

dule
  • 17,798
  • 4
  • 39
  • 38
  • 1
    The BNF would lead to something like this: `/^\s*?(((public|private|protected|static|final|native|synchronized|abstract|threadsafe|transient)\s+?)*)\s*?(\w+?)\s+?(\w+?)\s*?\(([^)]*)\)[\w\s,]*?(\{)?\s*?$/gm` and it doesn't check the semantics, either, it only checks the syntax. – caw Aug 09 '14 at 03:56