I have tried to develop a regex that captures a method and its body (The modifier is not important), but I could not develop a solid solution. The regex that I came up with so far is this: \\b\\w*\\s*\\w*\\s*\\(.*?\\)\\s*\\{([^}]+)\\}
It does not capture the methods correctly because it does not consider matching balanced Curley braces. Thus, sometimes it captures part of the method and not all. What am I doing wrong or what could I do to improve the solution that can capture the whole method!

- 3
- 2
-
1[Dyck languages](https://en.wikipedia.org/wiki/Dyck_language) are not a great fit for regular expressions. I would recommend to not try and solve this problem with a regular expression. – Turing85 Aug 07 '21 at 23:44
-
Thank You all for the answers; the answers helped point me in the right direction. – James Aug 10 '21 at 02:05
1 Answers
You can't do this. It's impossible.
The 'regular' in 'Regular Expression' refers to a certain subset of grammars; the so-called 'Regular Grammars'.
Here's the thing:
- Non-Regular Grammars cannot be parsed with regular expressions.
- Java (the language) is Non-Regular.
Thus, you can't use regular expressions for this, QED.
So, how do you parse java?
There are many ways; so far, java is still so-called LL(k) parseable, which means that just about every 'parser/grammar' library out there will be capable of parsing java code, and many such libraries ship with a java grammar as an example. These usually aren't quite perfect, but pretty good.
A basic web search gets you many options. Alternatively, javac is free (but GPL, you'd have to GPL anything you build with it), and ecj (the parser that powers eclipse, amongst other things) is open source with a more permissive license. It's also faster. It's also far harder to use, so there's that.
These are fairly complex tools. However, java is a very complex language (much programming languages are). Parsing them is decidedly non-trivial.
Before you think: Geez, surely it can't be this hard, consider:
public void test {
{}
String x = "{";
}
Which is legal java.
Or:
public void test() {
// method body
\u007D
That really is legal java, that \u007D
thing closes it. Of course...
public void test() {
//{} \u007D
}
Here the \u
thing doesn't. It is a real closing brace, but, that is in a comment.
Another one to consider:
public void test() {
class Foo {
String y = """
}
""";
}
}
Hopefully, considering the above, you realize you stand absolutely no chance whatsoever unless you use a parser that knows about the entire language spec.

- 85,357
- 5
- 51
- 72