0

How to To remove all style attributes BUT NOT TO REMOVE the style attributes which are available in table - PHP

For example:

<div style="text-align: justify; text-indent: -13.5pt; ><strong>Motion with Constant Acceleration</strong></div>
<table cellspacing="0" cellpadding="0" border="1" style="border: medium none; border-collapse: collapse;">
<tr><td width="114" style="border: 1pt;"><div align="center">&nbsp;</div></td>
<td width="264" style="border-width: 1pt 1pt 1pt medium;" colspan="2">Data Sheet</td>
<td width="157" style="border-width: 1pt 1pt 1pt medium;"><div align="center">&nbsp;</div></td>
</tr>
<tr style="height: 0.4in;"><td width="114" style="border-width: medium 1pt 1pt;"><div align="center">&nbsp;</div></td>
<td width="156" style="border-width: medium 1pt 1pt medium;">Incline angle</td>
<td width="108" style="border-width: medium 1pt 1pt medium;"><div align="center">&nbsp;</div></td>
<td width="157" style="border-width: medium 1pt 1pt medium;"><div align="center">&nbsp;</div></td>
</tr>
</table>

My output should be like this (Note the div tag):

<div><strong>Motion with Constant Acceleration</strong></div>
<table cellspacing="0" cellpadding="0" border="1" style="border: medium none; border-collapse: collapse;">
<tr><td width="114" style="border: 1pt;"><div align="center">&nbsp;</div></td>
<td width="264" style="border-width: 1pt 1pt 1pt medium;" colspan="2">Data Sheet</td>
<td width="157" style="border-width: 1pt 1pt 1pt medium;"><div align="center">&nbsp;</div></td>
</tr>
<tr style="height: 0.4in;"><td width="114" style="border-width: medium 1pt 1pt;"><div align="center">&nbsp;</div></td>
<td width="156" style="border-width: medium 1pt 1pt medium;">Incline angle</td>
<td width="108" style="border-width: medium 1pt 1pt medium;"><div align="center">&nbsp;</div></td>
<td width="157" style="border-width: medium 1pt 1pt medium;"><div align="center">&nbsp;</div></td>
</tr>
</table>
Fero
  • 12,969
  • 46
  • 116
  • 157
  • You shouldn't be using [regexes to handle HTML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). You should use a proper HTML parser, either stripping out the style attributes while parsing or walk the document tree to strip them. – outis Jun 06 '11 at 11:01

2 Answers2

0

Bad idea to parse / hack HTML with regex. You can try something like:

 s/(?<!table[^>])style=".*"//

Meaning : replace style="" by nothing when trying to match backward, you don't have table before any > character.

Might need some fine tuning to work however, haven't tried and I still think it's a bad idea.

to fine tune, I suggest looking a look-behind in regex. I don't know if lookbehind is supported by php regex, up to you to check, this is rather a skeletton than a complete answer.

Bruce
  • 7,094
  • 1
  • 25
  • 42
0

To do this properly I recommend using html purifier: http://htmlpurifier.org/, it is one of the only highly configurable html parsers that has a secure and bullet proof way of handling these methods.

You may play around to test it with allowed properties: http://htmlpurifier.org/demo.php

Configuration documentation: http://htmlpurifier.org/live/configdoc/plain.html#CSS.AllowedProperties

Arend
  • 3,741
  • 2
  • 27
  • 37