I need to validate HTML user input in a web App using JavaScript.
What I did so far based on this question: I'm using third party library, sanitize-html, to sanitize input and then compare it to original one. If they are different, Html is invalid.
const isValidHtml = (html: string): boolean => {
let sanitized = sanitizeHtml(html, sanitizationConfig);
sanitized = sanitized.replace(/\s/g, '').replace(/<br>|<br\/>/g, ''); // different browser's behavior for <br>
html = html.replace(/\s/g, '').replace(/<br>|<br\/>/g, '');
return sanitized === html;
}
The above method works fine with unescaped Html but not with escaped ones.
isValidHtml('<'); // false
isValidHtml('<'); // true
isValidHtml('<script>'); // false
isValidHtml('<script>'); // true, this should be false also!!!
- Am I missing something with this method?
- Is there a better way to do this task?
EDIT: As suggested by @brad in the comments, I tried to decode Html first:
decodeHtml(html: string): string {
const txt = document.createElement('textarea');
txt.innerHTML = html;
const decodedHtml = txt.value;
txt.textContent = null;
return decodedHtml;
}
and then call isValid(decodedHtml)
, I got this result:
isValidHtml('<'); // false
isValidHtml('<'); // false, this should be true!!!
isValidHtml('<script>'); // false
isValidHtml('<script>'); // false