2

I'm new to regex and have been trying to work this out on my own but I don't seem to get it working. I have an input that contains start and end flags and I want to replace a certain char, but only if it's between the flags.

So for example if the start flag is START and the end flag is END and the char i'm trying to replace is " and I would be replacing it with \"

I would say input.replaceAll(regex, '\\\"');

I tried making a regex to only match the correct " chars but so far I have only been able to get it to match all chars between the flags and not just the " chars. -> (?<=START)(.*)(?=END)

Example input:

This " is START an " example input END string ""
START This is a "" second example END
This" is "a START third example END " "

Expected output:

This " is START an \" example input END string ""
START This is a \"\" second example END
This" is "a START third example END " "
Jens
  • 23
  • 7
  • 1
    a little confused by how its worded. what exactly are you trying to replace? the START END and everything inbetween? or just some specific characters inbetween START and END? – csoler Aug 21 '22 at 01:23
  • only the quotes inbetween the START and END and nothing else, so any quotes that are not inbetween START and END should be left alone – Jens Aug 21 '22 at 01:33
  • you could do something like this: `(?<=START).*(").*(?=END)` and replace the1st group capture. I'm not great with regex, but that's how i could figure it – csoler Aug 21 '22 at 02:23
  • [example of using named groups and replace](https://www.demo2s.com/java/java-regular-expressions-named-groups.html) – csoler Aug 21 '22 at 02:27
  • 1
    Thank you for your answer, but your suggestion only seems to cathch the last quote inbetween the START and END in a group and it skips any quotes that came before it. – Jens Aug 21 '22 at 03:02

3 Answers3

2

Find all characters between START and END, and for those characters replace " with \".

To achieve this, apply a replacer function to all matches of characters between START and END:

string = Pattern.compile("(?<=START).*?(?=END)").matcher(string)
    .replaceAll(mr -> mr.group().replace("\"", "\\\\\""));

which produces your expected output.

Some notes on how this works.

This first step is to match all characters between START and END, which uses look arounds with a reluctant quantifier:

(?<=START).*?(?=END)

The ? after the .* changes the match from greedy (as many chars as possible while still matching) to reluctant (as few chars as possible while still matching). This prevents the middle quote in the following input from being altered:

START a"b END c"d START e"f END

A greedy quantifier will match from the first START all the way past the next END to the last END, incorrectly including c"d.

The next step is for each match to replace " with \". The full match is group 0, or just MatchResult#group. and we don't need regex for this replacement - just plain string replace is enough (and yes, replace() replaces all occurrences).

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • This works perfectly, I did have to modify it because the version of java i'm using doesn't support lambda's but works great thank you – Jens Aug 21 '22 at 14:00
0

For now i've been able to solve it by creating 3 capture groups and continuously replacing the match until there are no more matches left. In this case I even had to insert a replace indentifier because replacing with " would keep the " char there and create an infinite loop. Then when there are no more matches left I replaced my identifier and i'm now getting the expected result.

I still feel like there has to be a way cleaner way to do this using only 1 replace statement...

Code that worked for me:

class Playground {
    public static void main(String[ ] args) {
        String input = "\"ThSTARTis is a\" te\"\"stEND \" !!!";

        String regex = "(.*START.+)\"+(.*END+.*)";

        while(input.matches(regex)){
            input = input.replaceAll(regex, "$1---replace---$2");
        }

        String result = input.replace("---replace---", "\\\"");

        System.out.println(result);
    }
}

Output:

"ThSTARTis is a\" te\"\"stEND " !!!

I would love any suggestions as to how I could solve this in a better/cleaner way.

Jens
  • 23
  • 7
0

Another option is to make use of the \G anchor with 2 capture groups. In the replacement use the 2 capture groups followed by \"

(?:(START)(?=.*END)|\G(?!^))((?:(?!START|END)(?>\\+\"|[^\r\n\"]))*)\"

Explanation

  • (?: Non capture group
    • (START)(?=.*END) Capture group 1, match START and assert there is END to the right
    • | Or
    • \G(?!^) Assert the current position at the end of the previous match
  • ) Close non capture group
  • ( Capture group 2
    • (?: Non capture group
      • (?!START|END) Negative lookhead, assert not START or END directly to the right
      • (?>\\+\"|[^\r\n\"]) Match 1+ times \ followed by " or match any char except " or a newline
    • )* Close the non capture group and optionally repeat it
  • ) Close group 2
  • \" Match "

See a Java regex demo and a Java demo

For example:

String regex = "(?:(START)(?=.*END)|\\G(?!^))((?:(?!START|END)(?>\\\\+\\\"|[^\\r\\n\\\"]))*)\\\"";
String string = "This \" is START an \" example input END string \"\"\n"
+ "START This is a \"\" second example END\n"
+ "This\" is \"a START third example END \" \"";
String subst = "$1$2\\\\\"";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

String result = matcher.replaceAll(subst);

System.out.println(result);

Output

This " is START an \" example input END string ""
START This is a \"\" second example END
This" is "a START third example END " "
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 2
    Unfortunately I already used another answer, but this looks great and doesn't use lambda's so I wouldn't have had to rewrite it :), great explanation as well – Jens Aug 21 '22 at 14:02