0

This question is related to a similar case, namely Removing inline styles using php

The solution there does not remove i.e: <font face="Tahoma" size="4">

But let's say I have a mixed bag of inline styles and properties, like this:

<ul style="padding: 5px; margin: 5px;">
    <li style="padding: 2px;"><div style="border:2px solid green;">Some text</div></li>
    <li style="padding: 2px;"><font face="arial,helvetica,sans-serif" size="2">Some text</font></li>
    <li style="padding: 2px;"><font face="arial,helvetica,sans-serif" size="2">Some text</font></li>  
</ul>

What regExp is needed to achieve this result?

<ul>
    <li><div>Some text</div></li>
    <li><font>Some text</font></li>
    <li><font>Some text</font></li>  
</ul>
Community
  • 1
  • 1
Coreus
  • 5,360
  • 3
  • 35
  • 50

1 Answers1

5

As usual, regex isn't ideal for parsing HTML; it's very possible you'd be better off with an actual HTML parser.

That said...

$noattributes = preg_replace('/<(\w+) [^>]+>/', '<$1>', $original);

...will replace any opening tags that contain attributes with the corresponding tag w/o attributes. It might, however, accidentally also hit "tags" that are contained within quoted attributes of other tags (and thus not actually tags themselves). It will also cause problems with self-closing tags (it'll replace <br /> with <br>) - though this can be avoided if the self-closing tags don't have a space between the tag name and the slash.

Amber
  • 507,862
  • 82
  • 626
  • 550
  • Like so? $formatted = preg_replace('<(\w+) [^>]+>,'<$1>', $text); – Coreus Apr 16 '10 at 14:04
  • See my edited version; you have to remember to delimit the regex. – Amber Apr 16 '10 at 14:04
  • I agree, using HTML parsing is better – TravisO Apr 16 '10 at 14:19
  • Yes, of course. The thing is, I'm not parsing an entire XML/HTML document, I'm using xPath to retrieve the section I need, but the description for each item can contain some HTML (like the example provided). Using regExp on this section shouldn't hit too much performance-wise, should it? – Coreus Apr 16 '10 at 14:27
  • Probably not. If you're going to be using the same regexp multiple times, the PCRE evaluator usually caches the compiled form of the regex for you, so there's not too much of a hit. – Amber Apr 17 '10 at 19:03