0

I have to parse the output of a program to search for errors. The errors are indicated as:

[(FieldName/Value) = (phrase/What is Up John Carl?) failed rule alphanumeric] [(FieldName/Value) = (newLabel/Óscar's IPad) failed rule illegalchars]

There can be more than one error per line, and for each error I want to retrieve the words/sentences in bold. In order to do that I'm creating a regular expression as follows:

static String s1 = Pattern.quote("[(FieldName/Value) = (");
static String s2 = Pattern.quote(") failed rule");
static String s3 = Pattern.quote("]");
static Pattern p = Pattern.compile(s1 + "(\\w+)/(.+)" + s2 + "(.+)" + s3);
while (matcher.find()) {
    String token = matcher.group(1);
    sb.append("#");
    sb.append(token);
    token = matcher.group(2);
    sb.append("#");
    sb.append(token);
    token = matcher.group(3).trim();
    sb.append("#");
    sb.append(token);
}

But the output is :

#phrase#What is Up John Carl?) failed rule alphanumeric] [(FieldName/Value) = (newLabel/Óscar's IPad#illegalchars

So it is not returning two matches, just one. It is matching the second group to the rest of the string, instead of stopping at the first "failed rule". I suppose it is due to the first (.+) in the pattern, but the thing is that anything can go in there, so I need the (.+). Any ideas how to do it?

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Jose L Martinez-Avial
  • 2,191
  • 4
  • 28
  • 42

3 Answers3

1

As you can read at the end of this tutorial (.+) is Greedy, so it will search for max substring that fits regex, for example in "(ab)(cd)" regex (.+) will return ab)(cd.

What you want is Reluctant quantifier (.+?) (there is ? mark after +). Thanks to that regex will try to find minimal substring that matches your regex

"(ab)(cd)" with (.+?) will find ab and cd.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
0
Pattern p = Pattern.compile(s1+"(\\w+)/(.*?)" + s2 + "(.*?)" + s3);
Tony Zhu
  • 301
  • 1
  • 6
  • 16
  • 3
    You should also take a moment to describe why this works, rather than just posting code only. – Sam Oct 12 '12 at 03:29
0

You should be able to make the quantifiers non-greedy. You do this with the "?" symbol.

static Pattern p = Pattern.compile(s1 + "(\\w+)/(.*?)" + s2 + "(.*?)" + s3);

Take a look at this other example in SO:

Non-greedy Regular Expression in Java

Community
  • 1
  • 1
mjuarez
  • 16,372
  • 11
  • 56
  • 73