I am using regex in PHP to eliminate italic html orphan labels from a text line (living the text untouched if the labels are paired), using regex assertions, and I am being successful to do so when I want to eliminate <i> that is not followed by </i> later in the same line using this sentence:
$buffer = preg_replace('/(.*)<i>(?!.*<\/i>)(.*)/', "$1$2$3", $buffer);
but, when I try to do the same for </i> labels not preceded by <i> in the same line, using:
$buffer = preg_replace('/(.*)(?<!<i>.*)<\/i>(.*)/', "$1$2$3", $buffer);
It returns null (I think meaning syntax error). I know that the problem is the "*" after "?<!<i>." because if I use:
$buffer = preg_replace('/(.*)(?<!<i>.)<\/i>(.*)/', "$1$2$3", $buffer);
and I test with an string with only one character between <i> and </i> (like "TEST<i>1</i>WORKS") it goes fine, but of course, this is no useful, as I do not know how much characters will be between <i> and </i> in operations. I am assuming that the negative lookahead assertion of the first command should have a symmetric behaviour than the negative lookbehind assertion of the second, but it seems not to be the case.
Can someone wise in regex tell me how to circumvallate this issue?
Thanks to all and best regards.