0

I'm trying to create a regular expression where it replaces words which are not enclosed by brackets.

Here is what I currently have:

$this->parsed = preg_replace('/\b(?<!\[)('.preg_quote($word).')\b/','[$1['.implode(",",array_unique($types)).']]',$this->parsed);

Where $word could be one of the following, "Burkely Mayfair Trunk" or "Trunk".

It would replace the sentence

This Burkely Mayfair Trunk is pretty nice

for

This [Burkely Mayfair [Trunk[productname]][productname]] is pretty nice

Although it should become

This [Burkely Mayfair Trunk[productname]] is pretty nice

Since it replaces in order of the largest string to the smallest string, the smaller strings and or double occurences of word parts should not be replaced in an already replaced part of the string. It works when it's the first part of the string.

When I try to make a dynamic lookbehind it gives the following error: "Compilation failed: lookbehind assertion is not fixed length at offset 11". And I have no idea on how to fix this.

Anyone who has any ideas?

riekelt
  • 510
  • 3
  • 13
  • Lookbehinds must be of a fixed length in most GREP implementations. What's the one you tried? (Also: Why is the replacement "from largest to smallest" an issue?) – Jongware Sep 09 '13 at 14:17
  • Let me try to rephrase this question: You want to match a certain word or phrase only if it is not bracketed, then bracket it ? – Ibrahim Najjar Sep 09 '13 at 14:21
  • Jongware: Replacement from largest to smallest isn't the issue, but the smallest strings could contain parts which are already present in the larger strings and thus don't have to be bracketed anymore because the larger string is already bracketed. @Sniffer Exactly, see my explanation to Jongware – riekelt Sep 09 '13 at 14:59
  • OK. I think I know exactly what you want, it is a bit hard to do and I have one more question. What if one bracket is there but the other is missing, what should you do then or this could never happen ? – Ibrahim Najjar Sep 09 '13 at 15:17
  • This could never happen since the brackets are generated programatically. – riekelt Sep 09 '13 at 15:32
  • So why not check if a word is contained in another word and if so eliminate it from the list of words to be matched and replaced. This would be a lot easier than using regular expressions if that is even possible in the first place. – Ibrahim Najjar Sep 09 '13 at 15:48
  • That is not really an option, since a word could occur in the text apart from the already replaced part of the text. – riekelt Sep 09 '13 at 16:13

2 Answers2

0

After another morning of playing with the regex I came up with a quite dirty solution which isn't flexible at all, but works for my use case.

$this->parsed = preg_replace('/\b(?!\[(|((\w+)(\s|\.))|((\w+)(\s|\.)(\w+)(\s|\.))))('.preg_quote($word).')(?!(((\s|\.)(\w+))|((\s|\.)(\w+)(\s|\.)(\w+))|)\[)\b/s','[$10['.implode(",",array_unique($types)).']]',$this->parsed);

What it basically does is check for brackets with no words, 1 word or 2 words in front or behind it in combination with the specified keyword.

Still, it would be great to hear if anyone has a better solution.

riekelt
  • 510
  • 3
  • 13
0

You may match any substring inside parentheses with \[[^][]*] pattern, and then use (*SKIP)(*FAIL) PCRE verbs to drop the match, and only match your pattern in any other context:

\[[^][]*](*SKIP)(*FAIL)|your_pattern_here

See the regex demo. To skip matches inside paired nested square brackets, use a recusrsion-based regex with a subroutine (note it will have to use a capturing group):

(?<skip>\[(?:[^][]++|(?&skip))*])(*SKIP)(*FAIL)|your_pattern_here

See a regex demo

Also, since you are building the pattern dynamically, you need to preg_quote the $word along with the delimiter symbol (here, /).

Your solution is

$this->parsed = preg_replace(
    '/\[[^][]*\[[^][]*]](*SKIP)(*FAIL)|\b(?:' . preg_quote($word, '/') . ')\b/', 
    '[$0[' . implode(",", array_unique($types)) . ']]',
    $this->parsed);

The \[[^][]*\[[^][]*]] regex will match all those occurrences that have been wrapped with your replacement pattern:

  • \[ - a [
  • [^][]* - 0+ chars other than [ and ]
  • \[ - a [ char
  • [^][]* - 0+ chars other than [ and ]
  • ]] - a ]] substring.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563