2

This a regex question for which I couldn't find an answer yet:

Input:

"the current time is <start time>00:00:00<end time>. at 00:00:00 there is a firework. Another appearance of 00:00:00."

Desired output:

"the current time is <start time>00:00:00<end time>. at <start time>00:00:00<end time> there is a firework. Another appearance of <start time>00:00:00<end time>."

The solution must not involve first splitting the string by sentence.

What I tried:

A simple input.replace(group, replace) won't work because there is already a match that shouldn't be replaced.

    public static void main(String[] args) throws ParseException
    {
       String input = "the current time is <start time>00:00:00<end time>. at 00:00:00 there is a firework. Another appearance of 00:00:00.";
       Pattern p  = Pattern.compile("(<start time>)?(00:00:00)(<end time>)?");
       Matcher m  = p.matcher(input);
       while(m.find())
       {
            if(m.group(1) != null) { continue; }
            String substr1 = input.substring(0, m.start(2));
            String substr2 = input.substring(m.end(2), input.length());
            String repl = "<start time>" + m.group(2) + "<end time>";
            input = substr1 + repl + substr2;
       }
   }
Sync
  • 3,571
  • 23
  • 30
tenticon
  • 2,639
  • 4
  • 32
  • 76

2 Answers2

8

The reason your code isn't working is that you're modifying input within the loop, making the indexes on the match results invalid.

But the good news is you don't need the loop at all, you can use a combination of a negative lookbehind and a negative lookahead (details here) to skip the instances that already have the wrapper automatically, and use replaceAll to do the loop for you:

public static void main(String[] args) throws Exception
{
   String input = "the current time is <start time>00:00:00<end time>. at 00:00:00 there is a firework. Another appearance of 00:00:00.";
   String result = input.replaceAll("(?<!<start time>)00:00:00(?!<end time>)", "<start time>00:00:00<end time>"); 
   // Negative lookbehind -----------^^^^^^^^^^^^^^^^^        ^^^^^^^^^^^^^^
   // Negative lookahead ------------------------------------/
   System.out.println(result);
}

Live Example on IDEone

The negative lookbehind says "don't match if the text has this in front of it" and the negative lookahead says "don't match if the text has this after it."

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
1

Lookahead and lookbehind assertions can help you.

Negative lookbehind: "(?<!start)text" matches "footext" but not "starttext",

Negative lookahead: "text(?!end)" matches "textfoo" but not "textend".

Applying this to your case results in: "(?<!<start time>)(00:00:00)(?!<end time>)".

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137