0

Consider the following strings:

targethelloluketestlukeluketestluktestingendtarget
sourcehelloluketestlukeluketestluktestingendsource

I want to replace all instances of luke with something else, but only if it's between target...endtarget, not when it's between source...nonsource. The result should be that all three instances of luke in the top string are replaced with whatever I want.

I got this far, but this will only cap one instance of luke. How do I replace all of them?

(?<=target)(?:.*?(luke).*?)(?=target)

SOLUTION Thanks to the help of this great community, I arrived at the following solution. I find RegEx really convoluted when it comes to this, but in PHP the following works great and is a lot easier to understand:

function replaceBetweenTags($starttag, $endtag, $replace, $with, $text) {
    $starttag = escapeStringToRegEx($starttag);
    $endtag = escapeStringToRegEx($endtag);
    $text = preg_replace_callback(
        '/' . $starttag . '.*?' . $endtag . '/',
        function ($matches) use ($replace, $with) {
            return str_replace($replace, $with, $matches[0]);
        },
        $text
    );
    return $text;
}

function escapeStringToRegEx($string)
{
    $string = str_replace('\\', '\\\\', $string);
    $string = str_replace('.', '\.', $string);
    $string = str_replace('^', '\^', $string);
    $string = str_replace('$', '\$', $string);
    $string = str_replace('*', '\*.', $string);
    $string = str_replace('+', '\+', $string);
    $string = str_replace('-', '\-', $string);
    $string = str_replace('?', '\?', $string);
    $string = str_replace('(', '\(', $string);
    $string = str_replace(')', '\)', $string);
    $string = str_replace('[', '\[', $string);
    $string = str_replace(']', '\]', $string);
    $string = str_replace('{', '\{', $string);
    $string = str_replace('}', '\}', $string);
    $string = str_replace('|', '\|', $string);
    $string = str_replace(' ', '\s', $string);
    $string = str_replace('/', '\/', $string);
    return $string;
}

I'm aware of the fact that the escapeStringToRegEx is really quick and dirty, and maybe not even entirely correct, but it's a good starting point to work from.

Loek van Kooten
  • 103
  • 1
  • 7

2 Answers2

3

Here is a solution using a PHP regex callback function:

$input = "luke is here and targethelloluketestlukeluketestluktestingendtarget and luke is also here";
$output = preg_replace_callback(
    "/target.*?endtarget/",
    function ($matches) {
        return str_replace("luke", "peter", $matches[0]);
    },
    $input
);
echo $output;

This prints:

luke is here and targethellopetertestpeterpetertestluktestingendtarget and luke is also here

Note that occurrences of luke have been replaced with peter only inside the target ... endtarget bounds.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2

You can use

(?:\G(?!\A)|target)(?:(?!luke|(?:end)?target).)*\Kluke(?=(?:(?!(?:end)?target).)*endtarget)

See the regex demo. If the string has line breaks, you need to use the s flag, or prepend the pattern with (?s) inline PCRE_DOTALL modifier.

Regex details:

  • (?:\G(?!\A)|target) - either the end of the previous successful match or target string
  • (?:(?!luke|(?:end)?target).)* - any one char, zero or more occurrences but as many as possible that is not a starting point for the endtarget, target or `luke char sequence
  • \K - a match reset operator that discards the text matched so far
  • luke - string to replace
  • (?=(?:(?!(?:end)?target).)*endtarget) - a positive lookahead that matches a location that must be immediately followed with
    • (?:(?!(?:end)?target).)* - any one char, zero or more occurrences but as many as possible that is not a starting point for the endtarget or target char sequence
    • endtarget - an endtarget string.

If you can use preg_replace_callback, use it:

preg_replace_callback('/target.*?endtarget/s', function ($m) {
    return str_replace("luke", "<SOME>", $m[0]);
}, $input)

Or, unrolling the loop:

preg_replace_callback('/target[^e]*(?:e(?!ndtarget)[^e]*)*endtarget/', function ($m) {
    return str_replace("luke", "<SOME>", $m[0]);
}, $input)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    @LoekvanKooten You need to learn more about [tempered greedy tokens](https://stackoverflow.com/a/37343088/3832970). – Wiktor Stribiżew Jul 30 '21 at 10:07
  • Thank you so much for your help! I still find RegEx really complicated for instances like this, but preg_replace_callback makes it much easier. I've written a small function for people bumping into the same issue. See my edited posting. – Loek van Kooten Jul 30 '21 at 10:21
  • 1
    @LoekvanKooten There are no easy paths in the case like yours. Even the `preg_replace_callback` with a pattern like Tim's contain caveats (matching across multiple lines, or if the text you parse is very long). That is why I provided two more variations of the `preg_replace_callback` solution. The plain regex solution is a means of last resort, when you have no access to code. – Wiktor Stribiżew Jul 30 '21 at 10:23
  • I wish I could select both answers, as they were both really good. Yours is the most insightful, but it was Tim that came up with preg_replace_callback first and that made things a lot easier to understand. Nonetheless I have upvoted this and am forever grateful for all your help. I really appreciate it. Thank you. – Loek van Kooten Jul 30 '21 at 10:25