What is regular expression for detecting a for loo and another one for detecting while loop.
want to detect for(--;--;--)
and while (--comparison operator --)
constructs.

- 354,903
- 100
- 647
- 710

- 5,994
- 7
- 46
- 69
-
1No can do. Consider: `String s = "for(--;--;--)";` and: `/* for(--;--;--) */` – Bart Kiers Nov 21 '10 at 21:43
7 Answers
You can't do this reliably with a regex. You need to parse the code with a proper parser.

- 601,492
- 42
- 1,072
- 1,490
-
2And for an example, I recommend JavaParser from http://code.google.com/p/javaparser, a Java parser which parses Java code. The interesting/relevant part is the java_1_5.jj file which contains the tokens/grammar for the Java language (defined partially using regular expressions). – asdfjklqwer Nov 21 '10 at 19:36
-
-
I am sorry but such regular expression can be created. Matt does not want to build full parser of Java code. He has to detect for/while loops only. – AlexR Nov 21 '10 at 21:33
-
@AlexR, well, then post them here. Be sure to account for `for`'s and `while`'s in comments, in String literals and ones like @pst posted: `for (String s = "foo;"; s != null; s = f(s))` – Bart Kiers Nov 21 '10 at 22:03
-
@David, yeah, but it's always fun to see attempts posted. (sorry for teasing you a bit @AlexR!) :) – Bart Kiers Nov 21 '10 at 22:16
You can parse almost anything with modern (PCRE-style) regex. However, parsing certain things correctly is often pathologically difficult. It's easy to build a small, terse regex to match only certain kinds of simply formatted for loops:
for\s*\([^;]*?;[^;]*?;[^)]*?\)
But what happens when you run into something like this?
int i = 0;
for(
String s = "for(0;1;2)";
s.indexOf(String.valueOf(i)) != -1;
i++ // increment the i variable ;-)
)
Better to use a full-blown purpose-built Java parser if you need 100% reliability. The java.net article Source Code Analysis Using Java 6 APIs gives a jumping-off point for one way to do reliable parsing of Java source code.
In reply to Taz's comment:
I did it with
.*for(.*;.*;.*).*
what could be wrong with this?
Assuming all the for-loops you want to match have:
- no linebreaks in them,
- no embedded/trailing comments
- no "string" or 'c'haracter literals in them
I think your pattern should be OK. You might want to allow for whitespace between the for
and the opening parenthesis:
.*for\s*(.*;.*;.*).*
However as tchrist points out in his answer to this question, \s*
is not a perfectly correct way to allow for whitespace in Java source code, as Java source code supports types of Unicode whitespace that \s
does not allow for. Again, if you need 100% reliability, a full Java source code parser is probably a better choice.
Make sure you turn off (or don't turn on) the "dot matches newline" option in your parser (e.g. DOTALL or Singleline). Otherwise your regex could match across multiple lines, which is likely to cause your regex to match incorrectly.

- 1
- 1

- 10,027
- 3
- 40
- 54
-
Thanks. I have some assumptions and techniques for these problems. I did it with '.*for\(.*;.*;.*).*' what could be wrong with this. – Tasawer Khan Nov 21 '10 at 23:25
-
1@Taz You might want to allow for whitespace between the `for` and the opening parenthesis. Something like this might make it more flexible: `.*for\s*(.*;.*;.*).*` – Mike Clark Nov 22 '10 at 00:33
-
Thanks! All I need is to semi-manually find bad structured for loops in my source code. Way above is probably the worst accepted answer. – Bitterblue Feb 18 '20 at 09:31
You folks who are using \s
in Java to detect whitespace in Java code are making at least one and maybe several mistakes.
First of all, the Java compiler’s idea of whitespace in its own doesn’t line up with what \s
matches in Java. You may access the Java Character.isWhitespace()
through the \p{JavaWhitespace}
property.
Secondly, Java does not allow \s
to match Unicode whitespace; as implemented in the Java Pattern
class, \s
only matches ASCII whitespace. In fact, Java does not support any property that corresponds to Unicode whitespace.
Here’s a table showing some of the problem areas:
000A 0085 00A0 2029
J P J P J P J P
\s 1 1 0 1 0 1 0 1
\pZ 0 0 0 0 1 1 1 1
\p{Zs} 0 0 0 0 1 1 0 0
\p{Space} 1 1 0 1 0 1 0 1
\p{Blank} 0 0 0 0 0 1 0 0
\p{Whitespace} - 1 - 1 - 1 - 1
\p{javaWhitespace} 1 - 0 - 0 - 1 -
\p{javaSpaceChar} 0 - 0 - 1 - 1 -
What you’re looking at on the x-axis is four different code points:
U+000A: LINE FEED (LF)
U+0085: NEXT LINE (NEL)
U+00A0: NO-BREAK SPACE
U+2029: PARAGRAPH SEPARATOR
The y-axis has eight different regex tests, mostly properties. For each of those code points, there is both a J-results column for Java and a P-results column for Perl or any other PCRE-based regex engine.
It’s a big problem. Java is just messed up, giving answers that are "wrong" according to existing practice and also according to Unicode. Plus Java doesn’t even give you access to the real Unicode properties. For the record, these are the code points with the Unicode whitespace property:
% unichars '\pP{Whitespace}'
0009 CHARACTER TABULATION
000A LINE FEED (LF)
000B LINE TABULATION
000C FORM FEED (FF)
000D CARRIAGE RETURN (CR)
0020 SPACE
0085 NEXT LINE (NEL)
00A0 NO-BREAK SPACE
1680 OGHAM SPACE MARK
180E MONGOLIAN VOWEL SEPARATOR
2000 EN QUAD
2001 EM QUAD
2002 EN SPACE
2003 EM SPACE
2004 THREE-PER-EM SPACE
2005 FOUR-PER-EM SPACE
2006 SIX-PER-EM SPACE
2007 FIGURE SPACE
2008 PUNCTUATION SPACE
2009 THIN SPACE
200A HAIR SPACE
2028 LINE SEPARATOR
2029 PARAGRAPH SEPARATOR
202F NARROW NO-BREAK SPACE
205F MEDIUM MATHEMATICAL SPACE
3000 IDEOGRAPHIC SPACE
If you want, feel free to grab the unichars program and play around with it and its companion programs, uniprops and uninames. I haven’t added the Java-only properties yet, but I intend to. There are just too many nasty surprises like those described above.
For kicks and grins, would you believe there’s a \p{javaJavaIdentifierStart}
property in Java? I kid you not. But you wouldn’t believe the characters the compiler actually lets you use in identifiers; really you wouldn’t. Somebody wasn’t paying attention. Again. :(

- 78,834
- 30
- 123
- 180
-
I really though you made a joke about `\p{javaJavaIdentifierStart}` :| – Bart Kiers Nov 22 '10 at 07:18
-
@Bart, I *wish* I were joking, but I’m not. It gets worse than that, even. :( – tchrist Nov 22 '10 at 12:44
-
@Bart, @Mike: I just noticed that Java will return true for matching the string `"\uDC80"` against the pattern `\S` (that is, `"\\S"` the the pattern as an extra-backslashed string). This is **terrible**, because there isn’t even a character there at all. That’s at invalid surrogate; it’s not legal in isolation. Oh my oh my oh my! – tchrist Nov 22 '10 at 21:12
for ?\(.*?;.*?;.*?\)
while ?\(.+?\)
If the code's gonna be anything seriously complicated (Other than saying: Does this loop occur anywhere in the code) use a parser instead.
Why do we need these ? here. And I do need to detect that there is a comparison operator in while loop
If I were to leave the ? out then it would match for ( for(this;that;theother)
I updated the while loop to use +

- 11,402
- 10
- 52
- 72
-
thanks. this should be enough. why do we need these ? here. and I do need to detect that there is a comparison operator in while loop – Tasawer Khan Nov 21 '10 at 19:27
-
4
-
2@pst: Like I said, if it's gonna be anything that complicated, use a parser. @David: You could add `(\n|^).*?` to the front of them... – J V Nov 21 '10 at 19:55
I think that regular expressions given by JV contain extra question mark.
Here is my version:
for\s*\([^;]*;[^;]*;[^)]*\)
while\s*\(.*?\)
is correct but
while\s*\([^)]*\)
should be faster.

- 114,158
- 16
- 130
- 208
For loops are the easiest to detect:
for *\(.*;.*;.*)
While loops are a little trickier, as there are two ways to do it. If you want to use the format you specify above, this should work:
while *\(.*(<|>|<=|>=|==|!=).*\)
However, this does not detect while conditions that depend on the boolean value of a variable, nor the boolean result from a method, so this version would be a little simpler and match more:
while *\(.*\)

- 9,494
- 4
- 26
- 41
-
Thanks. I had written one for "for-loop" and was looking for exactly what you wrote for while loop. – Tasawer Khan Nov 21 '10 at 23:16
Regular expressions can only parse regular (Ch-3) languages. Java is not a regular language, it is at least context-free (Ch-2), maybe even context-sensitive (Ch-1).

- 363,080
- 75
- 446
- 653
-
2@Jörg, regular expressions haven’t been REGULAR since Ken Thompson put backrefs into his backtracking NFA for `grep`. The language parsed by `/(.)\1/` is not a REGULAR language. Big deal! The modern regexes used today easily go far beyond that, including even recursion. Textbook REGULAR regular expressions are of no relevance to the real world. The ivory-tower meaning of REGULAR has always been quite irregular from a natural language perspective, so you really might as well as forget about applying it to the real world. – tchrist Nov 22 '10 at 05:09
-
@tchrist, you're right, most modern day regex implementations can match non-regular languages, but you must agree that reliably parsing Java source files for `while`'s and `for`'s is not going to work. Especially not with an implementation not supporting recursive back references (like Java <1.6). – Bart Kiers Nov 22 '10 at 07:22
-
@tchrist: That's not a regular expression, that's a regex. The OP explixitly asked for regular expressions, not regex, so that's the question I am answering. – Jörg W Mittag Nov 22 '10 at 11:20
-
@Bart, yes, I agree. However, you needn’t do things all with one. That’s the mistake people often make. But there seems no reason not to use a parsing class. – tchrist Nov 22 '10 at 12:40
-
@Jörg, there is no difference between “regular expression” and “regex”. The latter is merely a shortcut abbrevation of the former. Certainly you can’t pretend there’s some nuance there that people can expect to distinguish. – tchrist Nov 22 '10 at 12:43
-
@tchrist, sure, chopping it all up in bite-sized (sub) regex-es is better, but you'd then be mimicking the behavior of a lexer. It is also my experience that people not familiar with regex hope to find a one-line-solution to their problem(s), which in this case, isn't going to work (as I know you know). – Bart Kiers Nov 22 '10 at 13:50
-
1@Jörg, come on now! :) The OP specifically tagged it with `java`, so it's (IMHO) clear that it is not some theoretical question (in which case it doesn't even belong on SO). In this case it is (again IMHO) obvious that the term *regex* and *regular expression* are equivalent. – Bart Kiers Nov 22 '10 at 13:54
-
@Bart, well, mostly. It is p̲o̲s̲s̲i̲b̲l̲e̲ ̲b̲u̲t̲ ̲n̲o̲t̲ ̲e̲x̲p̲e̲d̲i̲e̲n̲t̲, at least if I don’t have to limit the line to 80 characters. ☺ Anyway, I [happen to like lexers](http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491) quite a bit. – tchrist Nov 22 '10 at 13:57
-