0

I'm trying to get the first group of a regex pattern. I got this string from a lyric text:

[01:34][01:36]Blablablahh nanana

I'm this regex pattern to extract [01:34],[03:36] and the text.

Pattern timeLine = Pattern.compile("(\\[\\d\\d:\\d\\d\\])+(.*)");

But when I try to extract the first group [01:34] using group(1) it returns [03:36]

is there something wrong in the regex pattern?

Ricardo
  • 136
  • 1
  • 10

3 Answers3

3

Your problem is here

Pattern.compile("(\\[\\d\\d:\\d\\d\\])+(.*)");
                                      ^

This part of your pattern (\\[\\d\\d:\\d\\d\\])+ will match [01:34][01:36] because of + (which is greedy), but your group 1 can contain only one of [dd:dd] so it will store the last match found.

If you want to find only [01:34] you can correct your pattern by removing +. But you can also create simpler pattern

Pattern.compile("^\\[\\d\\d:\\d\\d\\]");

and use it with group(0) which is also called by group().

Pattern timeLine = Pattern.compile("^\\[\\d\\d:\\d\\d\\]");
Matcher m = timeLine.matcher("[01:34][01:36]Blablablahh nanana");
while (m.find()) {
    System.out.println(m.group()); // prints [01:34]
}

In case you want to extract both [01:34][01:36] you can just add another parenthesis to your current regex like

Pattern.compile("((\\[\\d\\d:\\d\\d\\])+)(.*)");

This way entire match of (\\[\\d\\d:\\d\\d\\])+ will be in group 1.

You can also achieve it by removing (.*) from your original pattern and reading group 0.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
1

I thin you are confused by the repeating match (\\[\\d\\d:\\d\\d\\])+ which returns just the last match as the group value. Try the following and see if it makes more sense to you:

    String s = "[01:34][01:36]Blablablahh nanana";
    Pattern timeLine = Pattern.compile("(\\[\\d\\d:\\d\\d\\])(\\[\\d\\d:\\d\\d\\])(.+)");
    Matcher m = timeLine.matcher(s);
    if (m.matches()) {
        for (int i = 1; i <= m.groupCount(); i++) {
            System.out.printf("    Group %d -> %s\n", i, m.group(i)); // prints [01:36]
        }
    }    

which for me returns:

Group 1 -> [01:34]
Group 2 -> [01:36]
Group 3 -> Blablablahh nanana
rolfl
  • 17,539
  • 7
  • 42
  • 76
1

I would simply grab the first part using a character class:

String timings = str.replaceAll("([\\[\\]\\d:]+).*", "$1");

And similarly the text:

String text = str.replaceAll("[\\[\\]\\d:]+", "");
Pshemo
  • 122,468
  • 25
  • 185
  • 269
Bohemian
  • 412,405
  • 93
  • 575
  • 722