I'm trying to develop a regex to look for a sequence of tags inside a string. For example, I can have the tag (NP .*)
at least one time (can be multiple times), followed by a punctuation symbol (in this case a ./.
). If there's another tag between de (NP)
and the ./.
(like the VP in the example below), the Matcher must not find anything. The problem is that even if I use the question mark after the .*
it will keep looking for a )
that makes the expression match something in the string. Here is my method:
public void myMethod() {
String input = "(NP first tag) (VP second tag) ./.";
String regex = "(\\(NP .*?\\)( )?)+" + "\\./\\.";
Pattern pattern = Pattern.compile("(" + regex + ")");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println("<S "+matcher.group(0)+">");
} else {
System.out.println("sem grupos.");
}
}
The method will still match the regex, but it shouldn't. I need it to tell me that no group was found, since the "VP" tag shouldn't be there. I believe the problem relies on the greedy strategy adopted by the Regex in Java. It tries to find some combination of characters that attend to the pattern described by the regex. I have no idea on how to rewrite this expression.
Any Help?
EDITED:
1) I noticed that my question was a bit confusing, so I changed a bit the example to make it clearer.
2) Thanks Aan Moore. I Agree that I was using more groups than necessary, but this happened because of the operators like +
. I tried to cut away the unnecessary groups. Also your simple idea of replacing the .*?
by a [^)]*?
was great! The only thing that I adjusted was that I escaped the )
symbol by using [^\\)]*?
. Below I show the final REGEX used.
String regex = "(\\(NP [^\\)]*?\\) ?)+\\./\\.";
Thanks a lot! :)