0

I am parsing a log file with 3 variations of a line and I'm trying to build a regex that matches and groups all variations.

Here are the line variations:

StatementId: [12345], UserId: 8756

StatementId: 12345, UserId: 8756

StatementId: [12345,6789], UserId: 8756

The current expression I have matches all cases, except #3.

I am expecting 2 groups. Using the lines above, the first group would be either 12345 or 12345,6789 The second group would simply be 8756

The problem I'm having is with line variation #3. The closing bracket ] being included in the first matching group.

Thus for line #3 the first group result is:

12345,6789]

I'm using this site for testing:

https://regex101.com/

Here is my regex:

(?:StatementId: \[?)(.*)(?:\]?, .*UserId: )([0-9]*)

What am I doing wrong?

EDIT:

I've attempted the suggested non-greedy solution(s) in several variations but that doesn't appear to solve the problem.

The expression variations I've tried will work on a single line variation, but not on all 3.

SOLUTION:

sln in the comments had 2 suggested solutions, both of which work.

Community
  • 1
  • 1
rcurrie
  • 329
  • 1
  • 3
  • 17
  • Use `.*?` instead of `.*`. – Wiktor Stribiżew Sep 09 '16 at 16:39
  • Get rid of the confusion: `StatementId: \[?(.*?)\]?,.*UserId: ([0-9]*)` –  Sep 09 '16 at 16:44
  • Thanks @Wiktor, but I've tried that and it does not resolve the issue of the closing bracket always being included in the first group, at least when using the regex101.com testing site. – rcurrie Sep 09 '16 at 16:46
  • @sln that expression only appears to match the first statementId in the comma delimited list for group 1 – rcurrie Sep 09 '16 at 16:49
  • 1
    Use a branch reset or use an extra capture group `StatementId:[ ](?|\[([^\[\]]*)\]|(.*?)),.*UserId:[ ]([0-9]*)` https://regex101.com/r/gX8qK6/1 or https://regex101.com/r/gX8qK6/2 –  Sep 09 '16 at 16:52
  • Thanks @sln that works great! – rcurrie Sep 09 '16 at 16:58
  • @sln Please post your solutions. – Wiktor Stribiżew Sep 09 '16 at 17:12
  • 1
    Other possible way: `StatementId: \[?\K[^]\s]*[^]\s,]` https://regex101.com/r/gE9nV1/1 or more simple `StatementId: \[?\K[0-9,]*[0-9]` https://regex101.com/r/gE9nV1/3 – Casimir et Hippolyte Sep 09 '16 at 17:28
  • @Casimir that works as well thanks! The one tweak I didn't mention is that sometimes there will be a space between the statement ids, but just a small change to your solution handles that as well: `StatementId: \[?\K[0-9, ?]*[0-9]` – rcurrie Sep 09 '16 at 17:47

1 Answers1

1

You can either use a Branch Reset (?|..) which reuses the capture groups
http://www.regex101.com/r/gX8qK6/1

StatementId:[ ](?|\[([^\[\]]*)\]|(.*?)),.*UserId:[ ]([0-9]*)

 StatementId: [ ] 
 (?|
      \[
      ( [^\[\]]* )                  # (1)
      \]
   |  
      ( .*? )                       # (1)
 )
 , .* UserId: [ ] 
 ( [0-9]* )                    # (2)

or,

Not use branch reset, which converts the group to an extra capture group
for the two cases of with/without []
http://www.regex101.com/r/gX8qK6/2

(Note- this case is an exclusive capture.
Means you can blindly concatenate groups 1 & 2 to form the string.
)

StatementId:[ ](?:\[([^\[\]]*)\]|(.*?)),.*UserId:[ ]([0-9]*)

 StatementId: [ ] 
 (?:
      \[
      ( [^\[\]]* )                  # (1)
      \]
   |  
      ( .*? )                       # (2)
 )
 , .* UserId: [ ] 
 ( [0-9]* )                    # (3)
  • One final comment on this. I was using this in a small Java application and I ultimately had to use solution #2 (extra capture group) because I was getting this exception with solution #1: `Caused by: java.util.regex.PatternSyntaxException: Unknown inline modifier near index 18 StatementId:[ ](?|\[([^\[\]]*)\]|(.*?)),.*UserId:[ ]([0-9]*)` Which is referencing the pipe character | in the branch reset clause – rcurrie Sep 09 '16 at 18:10