1

In try in this code to parse an srt subtitle:

public class MatchArray {

public static void main(String args[]) {

    File file = new File(
            "C:/Users/Thiago/workspace/SubRegex/src/Dirty Harry VOST - Clint Eastwood.srt");
    {

        try {
            Scanner in = new Scanner(file);

            try {
                String contents = in.nextLine();

                while (in.hasNextLine()) {
                    contents = contents + "\n" + in.nextLine();
                }



                String pattern = "([\\d]+)\r([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})[\\s]*-->[\\s]*([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})\r(([^|\r]+(\r|$))+)";


                Pattern r = Pattern.compile(pattern);

                // Now create matcher object.
                Matcher m = r.matcher(contents);

                ArrayList<String> start = new ArrayList<String>();
                while (m.find()) {
                    start.add(m.group(1));
                    start.add(m.group(2));
                    start.add(m.group(3));
                    start.add(m.group(4));
                    start.add(m.group(5));
                    start.add(m.group(6));
                    start.add(m.group(7));


                    System.out.println(start);

                }
            }

            finally {
                in.close();

            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

But when i execute it, it dosent capture any group, when try to capture only the time with this pattern:

([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})[\\s]*-->[\\s]*([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})

It works. So how do I make it capture the entire subtitle?

aleroot
  • 71,077
  • 30
  • 176
  • 213
  • possible duplicate of [Java API for SRT subtitles](http://stackoverflow.com/questions/5062914/java-api-for-srt-subtitles), look at second answer for correct regex. – Rossiar Aug 31 '13 at 05:55
  • Thanks @Rossiar, i had already tried that one, but i thought it had too many groups, and this one would be faster, if i could make it work. – user2719931 Aug 31 '13 at 06:23
  • Can you please post an example input line and an example of the captured group ? – aleroot Aug 31 '13 at 06:26
  • Input line: 1 00:05:29,384 --> 00:05:30,974 Jesus! 2 00:05:31,422 --> 00:05:33,376 To the city of San Francisco. what you mean by "captured group"? i only need to separate the id, the starting time,ending time and the text in different groups. – user2719931 Aug 31 '13 at 06:29

1 Answers1

0

I can not quite understand your need but i thought this can help. Please try the regex:

(\\d+?)\\s*(\\d+?:\\d+?:\\d+?,\\d+?)\\s+-->\\s+(\\d+?:\\d+?:\\d+?,\\d+?)\\s+(.+)

I tried it on http://www.myregextester.com/index.php and it worked.

I hope this can help.

Omer Sonmez
  • 1,168
  • 2
  • 20
  • 31