1

in a very big string I have to delete the [w:r][/w:r] where the substring "delete" exist. Example -of substring I want to delete - :

[w:r w:rsidR="00A37EED" w:rsidRPr="00FE1BE1"][w:rPr][w:b][/w:rPr][w:t]delete[/w:t][/w:r]

This one is my best guess \[w:r.*delete.*\[\/w:r\] I tried multiple regex expression but it's not my strong suit.

I copy-pasted the string on regex101 here's the link https://regex101.com/r/wS4bL2/1

I succeeded at finding the required pattern but I can't make it stop at the first occurence of [/w:r].

PHP code -in case you are wondering- :

$this->tempDocumentMainPart = preg_replace('/\[w:r.*delete.*\[\/w:r\]/','',$this->tempDocumentMainPart);
Su4p
  • 865
  • 10
  • 24

1 Answers1

2

The .* will overflow across the [....]s. One way is to use a tempered greedy token:

\[w:r\b(?:(?!\[w:r\b).)*?delete(?:(?!\[w:r\b).)*?\[\/w:r]
        ^^^^^^^^^^^^^^^^^       ^^^^^^^^^^^^^^^^^

See the regex demo

The (?:(?!\[w:r\b).)*? tempered greedy token will limit matching inside one [w:r (that has a word boundary on the right).

Add a DOTALL modifier /s ('/PATTERN/s') so as to match across newlines.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    It's hard to boost this regex performance by unrolling the pattern as [`'~\[w:r\b[^[]*(?:\[(?!w:r\b)[^[]*)*delete[^[]*(?:\[(?!w:r\b)[^[]*)*?\[\/w:r]~'`](https://regex101.com/r/mN4sR3/2) - the input is full of square brackets :( But still it is faster. – Wiktor Stribiżew Jul 20 '16 at 13:37
  • it seems great, let me try this on multiple examples and I'll be back with the green check – Su4p Jul 20 '16 at 13:51
  • I suggest using the unrolled variant, it is faster and is thus more reliable. – Wiktor Stribiżew Jul 20 '16 at 13:51
  • 1
    yea that's what I used – Su4p Jul 20 '16 at 14:12