I've the following example text
<p>in <span class="nanospell-typo">der</span> <span class="nanospell-typo"><dreipc data-type="abbreviation" data-uid="41">DDR</dreipc></span> <span class="nanospell-typo">kollaborieren</span>, <span class="nanospell-typo">gibt</span> es</p>
<li>per Post an <strong><dreipc data-type="abbreviation" data-uid="48">someAbbreviation</dreipc>, 10106 Berlin</strong> oder</li>
and following two regex patterns:
/(?:\<dreipc\ )(?:[^\>]*)(?:data\-type\=\")(.*?)(?:\"\ data\-uid\=\")(.*?)(?:\>)(.*?)(?:\<\/dreipc\>)/
/(?:<dreipc\ )(?:[^\>]*)(?:data\-type\=\")(.*?)(?:\"\ data\-uid\=\")(.*?)(?:>)(.*?)(?:<\/dreipc>)/
The first Regex works on regex101.com and in php. The second does match on regex101.com but not in php. I dont understand why. Actually I would need only the first regex, but then I get no matches when there are htmnlentities. Thats why I included a second regex pattern. I also dont want to use html_entity_decode on my string. The string mostly is very long and I dont want to decode htmlentities which might be wanted.
My php code looks like:
class MyClass {
const DREIPC_REGEX = '/(?:\<dreipc\ )(?:[^\>]*)(?:data\-type\=\")(.*?)(?:\"\ data\-uid\=\")(.*?)(?:\>)(.*?)(?:\<\/dreipc\>)/';
const DREIPC_REGEX_HTMLENTITIES = '/(?:<dreipc\ )(?:[^\>]*)(?:data\-type\=\")(.*?)(?:\"\ data\-uid\=\")(.*?)(?:>)(.*?)(?:<\/dreipc>)/';
public static function pregMatchHTMLNode($string = '')
{
$result = [];
preg_match_all(self::DREIPC_REGEX, $string, $matches, PREG_SET_ORDER, 0);
preg_match_all(self::DREIPC_REGEX_HTMLENTITIES, $string, $matchesHtmlentities, PREG_SET_ORDER, 0);
$matches = array_merge($matches, $matchesHtmlentities);
... doing some other things with matches
return $result;
}
}
So the best thing would be, to get the preg_match_all() working with my second pattern. But how?