First: I've read the general; don't use RegEx on XHTML arguments like this one: RegEx match open tags except XHTML self-contained tags and I do understand how RegEx will fail on nested XHTML or XML nodes.
I don't see why manipulating attributes of an XML alone should break using RegEx. So there seems to be exceptions to the general rule. Attributes are always contained in a single node starting with a <
and ending with a >
any other < or >
in between would break the XML so such can't occur.
Now I'd like to clean an XHTML string of any microdata it might contain. That is any attributes itemscope
, itemtype
, itemprop
, itemid
and itemref
. Something like this:
...
<body itemscope="itemscope" itemtype="http://schema.org/WebPage">
<div itemprop="maincontent">content</div>
...
What's the best way to do this in PHP?