0

I've the following example text

<p>in <span class="nanospell-typo">der</span> <span class="nanospell-typo"><dreipc data-type="abbreviation" data-uid="41">DDR</dreipc></span> <span class="nanospell-typo">kollaborieren</span>, <span class="nanospell-typo">gibt</span> es</p>
<li>per Post an <strong>&lt;dreipc data-type="abbreviation" data-uid="48"&gt;someAbbreviation&lt;/dreipc&gt;, 10106 Berlin</strong> oder</li>

and following two regex patterns:

/(?:\<dreipc\ )(?:[^\>]*)(?:data\-type\=\")(.*?)(?:\"\ data\-uid\=\")(.*?)(?:\>)(.*?)(?:\<\/dreipc\>)/
/(?:&lt;dreipc\ )(?:[^\>]*)(?:data\-type\=\")(.*?)(?:\"\ data\-uid\=\")(.*?)(?:&gt;)(.*?)(?:&lt;\/dreipc&gt;)/

The first Regex works on regex101.com and in php. The second does match on regex101.com but not in php. I dont understand why. Actually I would need only the first regex, but then I get no matches when there are htmnlentities. Thats why I included a second regex pattern. I also dont want to use html_entity_decode on my string. The string mostly is very long and I dont want to decode htmlentities which might be wanted.

My php code looks like:

class MyClass {
    const DREIPC_REGEX = '/(?:\<dreipc\ )(?:[^\>]*)(?:data\-type\=\")(.*?)(?:\"\ data\-uid\=\")(.*?)(?:\>)(.*?)(?:\<\/dreipc\>)/';
    const DREIPC_REGEX_HTMLENTITIES = '/(?:&lt;dreipc\ )(?:[^\>]*)(?:data\-type\=\")(.*?)(?:\"\ data\-uid\=\")(.*?)(?:&gt;)(.*?)(?:&lt;\/dreipc&gt;)/';


    public static function pregMatchHTMLNode($string = '')
    {
        $result = [];
        preg_match_all(self::DREIPC_REGEX, $string, $matches, PREG_SET_ORDER, 0);
        preg_match_all(self::DREIPC_REGEX_HTMLENTITIES, $string, $matchesHtmlentities, PREG_SET_ORDER, 0);
        $matches = array_merge($matches, $matchesHtmlentities);

        ... doing some other things with matches
        return $result;
    }
}

So the best thing would be, to get the preg_match_all() working with my second pattern. But how?

Falk
  • 621
  • 2
  • 9
  • 23

1 Answers1

0

The Quotes in my Frontend Output were not encoded, only the brackets for my tags. But the var_dump() showed, that the Quotes are encoded. So I changed the regex pattern to:

(?:&lt;dreipc\ )(?:[^\>]*)(?:data\-type\=&quot;)(.*?)(?:&quot;\ data\-uid\=\&quot;)(.*?)(?:&quot;&gt;)(.*?)(?:&lt;\/dreipc&gt;)

Now it works.

Falk
  • 621
  • 2
  • 9
  • 23