2

I need to detect the nesting of one tag in another to raise an error.

Examples :

anything <amb id="1">word1</amb> anything <amb id="2">word2</amb> anything // OK

anything <amb id="1">anything<amb id="2">word2</amb>anything</amb> anything // KO

It is therefore necessary to detect the presence of tags <amb... or </amb> between the tags <amb... and </amb>

I have a beginning of a pattern, but I can't manage the nested presence of the tag.

// #\<amb(.*?)\<\/amb\># => OK : detect the first level
$pattern = '#\<amb(?!\<amb)\<\/amb\>#'; // KO

if(preg_match($pattern, $string)) {
  throw new Exception('No nested tags are allowed.');
}

How do I solve this problem?

Emma
  • 27,428
  • 11
  • 44
  • 69
Alexdu98
  • 35
  • 5

2 Answers2

2

One way to check the nesting of tags is to check, if two continuous <amb tags are appearing without having a </amb> tag in between, then you can reject the string saying there is nesting of tags. This negative look ahead based regex should do the job,

^(?!.*<amb(?:(?!<\/amb).)+<amb).+$

Regex Demo 1

Similarly, another way is to check if two continuous <\/amb> tags appear without having a <amb tag in between, it means the tags are nested and again you can reject the string using this negative look ahead based regex,

^(?!.*<\/amb>(?:(?!<amb).)+<\/amb>).+$

Regex Demo 2

Let me know if this works for you.

Pushpesh Kumar Rajwanshi
  • 18,127
  • 2
  • 19
  • 36
  • 1
    Thank you! It seems to be working! I didn't know how to write it in regex, and given the complexity, I wasn't ready to do it. – Alexdu98 May 01 '19 at 20:06
0

You don't need regular expressions for this. They are a pain. What you can do is explode the string on </amd> and then check that each part has, at most, one <amb in it. Like so:

function correctlyNested($html, $tag)
{
    foreach (explode("</$tag>", strtolower($html)) as $part) {
       if (substr_count($part, "<$tag") > 1) return false; // it is KO
    }
    return true; // it is OK
}


$tests = ['anything <amb id="1">word1</amb> anything <amb id="2">word2</amb> anything',
          'anything <amb id="1">anything<amb id="2">word2</amb>anything</amb> anything'];

foreach ($tests as $test) {
    echo $test . (correctlyNested($test, "amb") ? " // OK<br>" : " // KO<br>");
}

This code is easy to understand and maintain. I added the strtolower() to show how easy it is to extend this code.

KIKO Software
  • 15,283
  • 3
  • 18
  • 33
  • Yes you're right, the use of a regex was not mandatory to solve the problem, but for reasons of homogeneity I preferred. Thanks anyway for your answer, it also works. – Alexdu98 May 01 '19 at 20:08