1

I have a pattern like the following

API:ADD|TYPE:ABC...MATCH:TRUE

[LOTS OF OTHER LOG LINES]

API:ADD|TYPE:ABC...MATCH:TRUE

[LOTS OF OTHER LOG LINES]

API:ADD|TYPE:DEF...MATCH:TRUE

I tried the following regex:

(API:.*MATCH:(TRUE|FALSE))

while (matcher.find()) {
    System.out.println(i + " occurence");
    i++;
    matches.add(matcher.group());
}

It matches from first "API" to last "TRUE" and hence only one substring is returned! I want three substrings (in this scenario) starting from "API" till either "TRUE" or "FALSE".

Appreciate your help in this regard.Thanks.

Edit:

-------------------------------------------------------
20:31:57    CALL    add     35  
-------------------------------------------------------
20:31:57    REASON  API:ADD|TYPE:ABC|ErrorType:VALIDATION|Error Message:User already has|MATCH:FALSE 
user104309
  • 690
  • 9
  • 20
  • You need to know how many groups you want to capture with regex. Do you expect that you will have a fixed number? – Tim Biegeleisen Oct 28 '15 at 07:04
  • How are you reading the input?. Line by line?. or all those lines are in one String?. – TheLostMind Oct 28 '15 at 07:05
  • 1
    Please show us an entire sample line along with what you want to extract. – Tim Biegeleisen Oct 28 '15 at 07:05
  • I receive it as plain text from HTML response and have the entire content in a String. It can be any number of times - wherever the pattern occurs in the String,need them in a list. – user104309 Oct 28 '15 at 07:12
  • `.*` wouldn't match across new lines so I believe OP has buffered the content into a string without the new lines. – Ravi K Thapliyal Oct 28 '15 at 07:13
  • @user104309 If you print out the HTML response do you see any new lines? – Ravi K Thapliyal Oct 28 '15 at 07:13
  • You need to show us an entire line so we know what to work with here. – Tim Biegeleisen Oct 28 '15 at 07:17
  • @TimBiegeleisen Please see my edit. I guess all new lines are removed while storing in String (jsoup) – user104309 Oct 28 '15 at 08:49
  • How are you reading in this log file? From your example, it appears that you are reading in all 3 lines into a single string. If this be the case, then the regex isn't the problem. Please clarify this point for us. – Tim Biegeleisen Oct 28 '15 at 08:53
  • @user104309 The solution is very simple. If you (or jsoup) aren't preserving new lines use `.*?`, otherwise make sure the newlines are not stripped and your current regex should then work without any issues provided your pattern never occurs more than once per line. – Ravi K Thapliyal Oct 28 '15 at 08:59

2 Answers2

1

Use a non-greedy quantifier .*? and the regex will start matching as little as possible.

(API:.*?MATCH:(TRUE|FALSE))
Ravi K Thapliyal
  • 51,095
  • 9
  • 76
  • 89
1

This is because you use .* which is a greedy quantifier.

You should try to use .*? which is reluctant -- that means that it matches the smallest possible substring.

For more info about greedy vs. reluctant see the excellent answers here: Greedy vs. Reluctant vs. Possessive Quantifiers

Community
  • 1
  • 1
Stefan Winkler
  • 3,871
  • 1
  • 18
  • 35