I have been at this on and off for a few days, but my RexEx mastery is not great. Yes I understand that RegEx is not for parsing HTML. I am doing server side "cleaning" of CKEditor input, which already does this, but only client side.
After striping none white-listed tags...
First: $html = preg_replace(' on\w+=(["\'])[^\1]*?\1', '', $html);
remove all event attributes properly quoted with either '
or "
quotes
Second: $html = preg_replace(' on\w+=\S+', '', $html);
*remove the ones that have no quotes but still can fire, ex. onclick=blowUpTheBase()
What I would like to do is ensure the onEvent is between <
& >
but I can only get it to work if the onEvent attribute is the first one after a tag. Everything I try ends up capturing most of the code. I just cant get it lazy enough.
ex. $html = preg_replace('<([\s\S]?)( on\w+=\S+) ([\s\S]*?)>', '<$1 $3>', $html);
EDIT: I am going to select @colburton's answer because RegEx is what I asked for. I will also use it for my particular situation because it will due the trick. (it is an internal application anyhow)
BUT
I want to thank @Casimir et Hippolyte for his answer because it gives a great example and explanation about how to do this the "right way". I will in short order write up a function using DOMDocument and it will become my goto way of handling RTE/WYSIWYG/HTML input.