0

I have a String which I want to parse. The String is like this :-

00:0qwe8.0 donald controller duck [02009&123@##]: Some more sring here Model number 420 Family [Super-cool] [15b31013^^@#][15b:31013]

Notice the last Square bracket has a : colon in it. and the character before Some More is also a colon. I want to capture all the characters between them.

Currently I am parsing it with the following regex in two steps.Here is the java code.

class JavaReg{

 public static void main(String[] args){

   String str = "00:0qwe8.0 donald controller duck [02009&123@##]: Some more sring here Model number 420 Family [Super-cool] [15b31013^^@#][15b:31013]";
   String[] strArr = str.split("\\[.*?\\]\\:\\s");
   String[] str12 = strArr[1].split("\\[\\w*?\\:.*");
   for(String strinj : strArr)
      System.out.println(strinj);

   System.out.println(str12[0]);
 }

}

The following is the result of the above exercise.

00:0qwe8.0 donald controller duck
Some more sring here Model number 420 Family [Super-cool] [15b31013^^@#][15b:31013]
Some more sring here Model number 420 Family [Super-cool] [15b31013^^@#]

The last string is what I want. It starts capturing from the colon : and goes on to capture till the Square bracket which has a colon.

The question is can I use capturing groups in regex to capture it in one shot. How to do that in Java?

John Doe
  • 2,752
  • 5
  • 40
  • 58
  • Possible duplicate of [How to extract a substring using regex](https://stackoverflow.com/questions/4662215/how-to-extract-a-substring-using-regex) – pringi Mar 27 '19 at 11:15
  • No, this is not duplicate. It has its own specific uniqueness. – John Doe Mar 27 '19 at 14:47

2 Answers2

1

You may use the following regex to extract the match:

\[[^\]\[]*\]:\s*(.*?)\[\w*:

See the regex demo.

Details

  • \[ - a [ char
  • [^\]\[]* - 0+ chars other than ] and [
  • \]: - a ]: substring
  • \s* - 0+ whitespaces
  • (.*?) - Group 1: any 0 or more chars other than line break chars, as few as possible
  • \[ - [ char
  • \w* - 0+ letter, digits or _
  • : - a colon.

Use it with Matcher#find() and grab matcher.group(1), see the Java demo:

String str = "00:0qwe8.0 donald controller duck [02009&123@##]: Some more sring here Model number 420 Family [Super-cool] [15b31013^^@#][15b:31013]";
Pattern pattern = Pattern.compile("\\[[^\\]\\[]*\\]:\\s*(.*?)\\[\\w*:");
Matcher matcher = pattern.matcher(str);
while (matcher.find()){
    System.out.println(matcher.group(1)); 
} 
// => Some more sring here Model number 420 Family [Super-cool] [15b31013^^@#]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Yeah man, both of the answers are good, I was also in dilemma, the other one came earlier so I picked it. But Thanks Sir. – John Doe Mar 27 '19 at 11:56
  • @JohnDoe My answer was posted at 11:18:32 and the other one was posted at 11:18:40. My answer was posted earlier. – Wiktor Stribiżew Mar 27 '19 at 11:57
  • Done dude done, but please explain this one from your answer \[[^\]\[]*\] – John Doe Mar 27 '19 at 11:59
  • 1
    @JohnDoe It is a [negated character class](https://www.regular-expressions.info/charclass.html#negated) that matches any single char that does not belong to the specified char set. So, it matches any char but `[` and `]`. These inside must be escaped since they are used to form character class unions and intersections. – Wiktor Stribiżew Mar 27 '19 at 12:02
  • So you could have written [^ \\] \\[ ]* like this as well [^\\[ \\] ]* . – John Doe Mar 27 '19 at 12:11
  • 1
    @JohnDoe `[^\]\[]` can be written as `[^\[\]]`. That is all. – Wiktor Stribiżew Mar 27 '19 at 12:12
1

You can use this single regex to capture the string you want:

(?<=\]: ).*(?=\[)

Demo

Here, (?<=\]: ) this positive look behind matches ]: literally to match your data like you mentioned, and from that point onwards, greedily captures everything till it sees last [ which also contains a colon, and although since that was the point till where you wanted to capture, I didn't put constraints for colon further.

But in case you want to also enforce it should stop matching just before a [something1:something2], then you can use this regex,

(?<=\]: ).*(?=\[[^[\]]*:[^[\]]*\])

Demo with a finer regex

The Java code

String s = "00:0qwe8.0 donald controller duck [02009&123@##]: Some more sring here Model number 420 Family [Super-cool] [15b31013^^@#][15b:31013]";
Pattern p = Pattern.compile("(?<=\\]: ).*(?=\\[[^\\[\\]]*:[^\\[\\]]*\\])");
Matcher m = p.matcher(s);
if (m.find()) {
    System.out.println(m.group());
}

prints:

Some more sring here Model number 420 Family [Super-cool] [15b31013^^@#]
Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Pushpesh Kumar Rajwanshi
  • 18,127
  • 2
  • 19
  • 36
  • (.*) I can understand that you are using this to capture the group but please explain how your regex. – John Doe Mar 27 '19 at 11:21
  • Thanks but please clarify this one a little bit more (?<=\]: ) – John Doe Mar 27 '19 at 11:30
  • 1
    `(?<=]: )` this is called positive look behind expression and marks the point which is preceded by exactly `]: ` literal. Notice it has `]` `:` and a space. [Check the pink pointer in this regex](https://regex101.com/r/kd9o0E/4) which shows the beginning of capture. – Pushpesh Kumar Rajwanshi Mar 27 '19 at 11:34
  • 1
    So this is (?=\[) positive lookahead which matches the last [ (square bracket) correct? – John Doe Mar 27 '19 at 11:45
  • 1
    Yes, not `(?=[)` but this `(?=\[)` Escaping `[` is necessary as `[` has a special meaning in regex. But in case you want the match to stop just before a `[something:something]` then even better positive look ahead regex is `(?=\[[^[\]]*:[^[\]]*\])` which will ensure it only stops at a bracket containing a colon inside and doesn't matter whether it is last or not. [Check the pink pointer in this regex](https://regex101.com/r/kd9o0E/6) which shows the end mark till where the data will be captured. – Pushpesh Kumar Rajwanshi Mar 27 '19 at 11:47