4

Possible Duplicate:
What is the best free way to clean up Word HTML?
PHP to clean-up pasted Microsoft input

I allow clients to enter notes in a rich text editor, and have only recently upgraded to ckEditor 3x, which strips MS word classes, styles, and comments by default (when users paste into the editor object). So moving forward I'm all set.

I've recently had a need to clean up 5 years worth of notes some of which have MS word generated HTML embedded. I need to loop through this body of text and clean it.

I do not need to strip out all span tags, only those identified as written by Microsoft.

I've tried using HTMLCleaner, but it is not removing the MS generated HTML. http://word2cleanhtml.com does exactly what I want, however the developers are currently not offering the API for public use (as of July 9, 2012).

I've looked for such a class off and on for the last few weeks and am not having much luck. Have any of you found a useful class you'd like to share?

Community
  • 1
  • 1
a coder
  • 7,530
  • 20
  • 84
  • 131
  • To clarify, I need a server-side class that I can embed in my existing application. There are some GREAT tools in the SO questions/answers referred by mario, however I'm finding that they are designed for one-shot conversions for the most part -- or using curl to post out to their website. Can't do that with PHI. – a coder Jul 09 '12 at 17:55
  • Mario, where have your URLs gone? They were useful, and part of my justification for closure. They should be useful to the OP to do the HTML cleanup using a retrospective script, too. – halfer Jul 09 '12 at 17:57
  • I hope this question will be left available, as maxhud's answer below addresses my question to a T. htmlpurifier is a class, not a website that users browse & upload to. – a coder Jul 09 '12 at 20:53
  • 1
    I noticed the two new links added to the top of my question. The first does not fully answer what I'm looking for. The second however, does but did not come up in my initial search (apologies). The accepted answer is the same as maxhud's below. – a coder Jul 09 '12 at 20:56
  • Hi a_coder; yes, if a question is closed like this, it should always be available - I think this always happens if the question auto-closes due to the necessary five close votes having been reached. When manual deleting, some mods remove questions entirely, but I don't think that happens in these ordinary cases. – halfer Jul 09 '12 at 20:58

1 Answers1

6

http://htmlpurifier.org/

This will do what you want.

Max Hudson
  • 9,961
  • 14
  • 57
  • 107