Wiki-formatting in some kind makes it easy for users to avoid HTML: **bold** oder //italic//
for example. What I am looking for is an efficient way to replace such formatting codes with HTML code while preserving stuff that is masked by ''. Example:
Replace **this** but do ''not touch **this**''
Doing this in multiple steps would be quite easy:
preg_match('/(''|**)(.*?)\\1/', ...
if ($match[0] === "''") {
// Do not touch, further replacements will follow
} else {
// Replace by HTML
}
The PHP preg_replace() function is quite efficient to replace multiple patterns, because when using arrays for pattern/replace I will only call it once and avoid the calling overhead. Example:
preg_replace(
array(
'/\\*\\*(.*?)\\*\\*',
'/__(.*?)__/',
'/\\/\\/(.*?)\\/\\/'
),
array(
'<strong>\\1</strong>',
'<u>\\1</u>',
'<i>\\1</i>'
),
$s
)
Btw.: This function will be calles about 100 to 1000 times each time, a dynamic page is created - therefore my need for some efficiency.
So my question is: Is there a way to encode the masking in a regular expression + replacement that I can use with preg_replace() like in the latter example? Of course, nested formatting should remain possible.
What I found here is a way to remove stuff (Condition inside regex pattern), but I cannot apply this to my problem, because the replacement naturally leaves unwanted void tag-pairs:
preg_replace(
array(
'/(\'\'(.*?)\'\')|(__(.*?)__)/',
'/(\'\'(.*?)\'\')|(\\*\\*(.*?)\\*\\*)/',
'/\'\'(.*?)\'\'/'
),
array(
'\\1<u>\\4</u>',
'\\1<strong>\\4</strong>',
'\\1'
),
$s
);
// Leaves a void <u></u> and <strong></strong> for each masked section
Note: The '' must survive each replacement except the last one, or sections would be demasked early. Therefore the \1 replacement.
Of course I could finally strip all the void tags, but this seems rather stupid to me. And I am quite sure, I just don't see the obvious...
Thanks for any suggestions!