0

I am working with html document generated from Micrsoft Word 2007/2010. Besides generating incredibly dirty html, word also has the tendency of using both block and inline style. I am looking for a php library would merge block into already existing inline style element.

Edit The goal is to construct a html block preserve the original formatting and editable in WYSIWYG editor like tinyMCE

Example

If the original html is:

    <html>
    <head>
    <style>
    .normaltext {color:black;font-weight:normal;font-size:10pt}
    .important {color:red;font-weight:bold;font-size:11pt}
    </style>
    <body>
    <p class="normaltext" style="font-family:arial">
    Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
In ut erat id dui mollis faucibus. Mauris eu neque et eros tempus placerat. 
<span class="important">Nam in purus nisi</span>, vitae dictum ligula. 
Morbi mattis eros eget diam vulputate imperdiet. 
<span class="important" style="color:green">Integer</span> a metus eros. 
Sed iaculis porta imperdiet.
    </p>
    </body>
    </html>

Should become:

    <html>
    <head>
    <body>
    <p style="font-family:arial;color:black;font-weight:normal;font-size:10pt">
    Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
    In ut erat id dui mollis faucibus. Mauris eu neque et eros tempus placerat. 
    <span style="color:red;font-weight:bold;font-size:11pt">Nam in purus nisi</span>, vitae dictum ligula. 
    Morbi mattis eros eget diam vulputate imperdiet. 
    <span style="color:green;font-weight:bold;font-size:11pt">Integer</span> a metus eros. 
    Sed iaculis porta imperdiet.
    </p>
    </body>
    </html>
hakre
  • 193,403
  • 52
  • 435
  • 836
ltfishie
  • 2,917
  • 6
  • 41
  • 68
  • So you're trying to get rid of the CSS blocks and move all of it inline? Or the other way around? – Francois Deschenes Mar 26 '12 at 05:39
  • 3
    The value of this is questionable. It's a complex task since it requires full DOM and CSS parsers and it's likely to lead to considerably more bloat than you already have. – SpliFF Mar 26 '12 at 05:43

4 Answers4

1

See the CssToInlineStyles project which does exactly what you want.

emeraldjava
  • 10,894
  • 26
  • 97
  • 170
1

Check out:

Porting code from either of the sources to PHP, or using any of the available APIs should do the trick of getting your CSS styling inline.

soulseekah
  • 8,770
  • 3
  • 53
  • 58
  • Thanks, emogrifier seems very close. Will try to play with it. Already tried http://blog.verkoyen.eu/blog/p/detail/convert-css-to-inline-styles-with-php and doesn't merge style. – ltfishie Apr 02 '12 at 01:29
0

I finally managed to get it to work. The code is based off of http://blog.verkoyen.eu/blog/p/detail/convert-css-to-inline-styles-with-php with once simple change: Moving the line

// add new properties into the list
foreach($rule['properties'] as $key => $value) $properties[$key] = $value;

up to the begining of the loop, right after where $properties is declared.

To make this work for WordPress however, one additional change is needed. DomDocument replace &nbps; from the document with blanks, which breaks WordPress update statement and lead to cotent being cut off. Please refer to my other question for the solution: DOMDocument->saveHTML() converting &nbsp; to space

This problem is detailed in https://wordpress.stackexchange.com/questions/48692/post-content-getting-cut-off-on-blank-space-on-wpdb-update. If you know why this is happening for WordPress, please post your answer there as I would very much like to find out why it is happening.

Community
  • 1
  • 1
ltfishie
  • 2,917
  • 6
  • 41
  • 68
0

No, but try this instead, copying and pasting from word into http://ckeditor.com/ or tinymce, etc does clean it up A LOT, thought it's still not perfect it will get you much closer.

user1289347
  • 2,397
  • 1
  • 13
  • 16