I am developing a web application using Python/Flask/SQLAlchemy on the server side.
I am using the wysihtml rich text editor to allow users to enter text with a very limited subset of HTML in it. While wysihtml sanitizes the resulting HTML on the client side, some kind of server-side checking is required to ensure that only that subset of HTML is accepted. To repeat, it not only should be valid HTML, I want it to only contain a very limited set of tags. Furthermore, it doesn't have to be a complete HTML document.
Furthermore, I would like to know when non-compliant HTML is submitted, as it is either a bug in the client-side validation, or a (likely malicious) attempt to bypass it indicating an attack.
I could use Bleach to sanitize the user supplied HTML, but that does not work as a validator (there is no easy way to tell whether the sanitized HTML has been substantively changed) , and the developer has made clear that he regards validation as outside the scope of his tool.
I have looked, but there doesn't appear to be a standard tool for doing validation in these circumstances.
I would prefer not to roll my own if I don't have to for two reasons: first, it will take extra time, and second, I don't want to run the risk of making rookie mistakes.
So can anybody point me to a standard method for doing this server-side in Python? And, if not, why doesn't one exist? Is the thinking behind my need for one misguided, and if so why?