2

I have this html code :

$html = "<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; padding: 4px;" class=MsoNormal>text</P>";

I need to remove all mso-* tag, and the result will be:

$html = "<P style="padding: 4px;" class=MsoNormal>text</P>";

How can I do with php?? Many Thanks

Ste
  • 1,497
  • 8
  • 33
  • 63
  • possible duplicate of [PHP to clean-up pasted Microsoft input](http://stackoverflow.com/questions/379342/php-to-clean-up-pasted-microsoft-input) – Pekka Mar 20 '12 at 11:37
  • @Pekka not a good dup imo. It basically just says use HTMLPurifier or Tidy and there is just one answer altogether. – Gordon Mar 20 '12 at 11:42
  • 1
    @Gordon I guess it depends on what the OP really wants. If he wants to clean up *all* the Microsoft stuff, HTMLPurifier is indeed the best method I know. If he wants to do *exactly* what he shows above (and nothing more), it's different. – Pekka Mar 20 '12 at 11:43
  • @Ste can you please clarify what you are after: cleaning this particular snippet or cleaning all the Microsoft stuff altogether. – Gordon Mar 20 '12 at 11:46

6 Answers6

3

This would work:

echo preg_replace(
    '(
        mso-   # match anything with the mso vendor prefix
        .+?    # followed by at least one character
        ;      # up to the first semicolon
        [ ]*   # and an optional space
    )xi',
    '',        // replace that match with nothing
    $html
);

However, in case there is more that just that one line of html in $html, have a look at Grabbing the href attribute of an a element to learn how to easily and reliably fetch attributes from elements in html. Then use above regex to replace the node values.

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
0

I've tested the solution of Dr.Kameleon: it works fine but it doesn't work for all situation. For example for the following code, mso-* attributes will not remove:

<p style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto' class=MsoNormal>text</P>

(I delete some spaces and ";").

So, I suggest you some improvements of the code of Dr.Kameleon:

$cleanHtml = preg_replace('(mso-[a-z0-9\s\-:;]+)i', '', $html);

Best regard

Scipius2012
  • 675
  • 1
  • 9
  • 21
0

You can also try this one;

(mso-[^:]*:[^;]*;)

However, do not forget to not to parse html with regex, it's a really big sin!

WhoSayIn
  • 4,449
  • 3
  • 20
  • 19
0
preg_replace('/mso-.+?:\s*?.+?;/s', '', $html);
Juan Mellado
  • 14,973
  • 5
  • 47
  • 54
safrazik
  • 1,517
  • 10
  • 14
0
<?php
$string = '<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; padding: 4px;" class=MsoNormal>text</P>';
$patterns = '/mso-(.*?);/';
$replacements = '';
echo preg_replace($patterns, $replacements, $string);
?>
0

Code :

$html = "<p style='mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; padding: 4px;' class=MsoNormal>text</P>";

$cleanHtml = preg_replace('(mso-[a-z\-: ]+; )i', '', $html);

echo $cleanHtml;

Output :

<P style='padding: 4px;' class=MsoNormal>text</P>
Dr.Kameleon
  • 22,532
  • 20
  • 115
  • 223
  • 1
    This is a valid regex solution. Although I'd use `~` or `/` as regex delimiters and put `-` at the end of the character class to avoid overescaping: `preg_replace('~mso-[a-z: -]+; ~i', '', $html)`. – Wiktor Stribiżew Dec 17 '20 at 10:27