0

Can this little function return valid HTML?

function HtmlSanitizer(text) {
    return text.replace(/&(?!\s)|</g, function (s) { if(s == '&') return '&amp;'; else return '&lt;'});
}

Edit: The objective of this function is avoid html injection. That's why I'm asking.

razpeitia
  • 1,947
  • 4
  • 16
  • 36

2 Answers2

1

That function only return a small fraction of things that you may want to encode into HTML entities. So as such, I would say the answer to your question is "no".

You might want to Google search for something like 'javascript html entity encode' or similar to find something more complete.

Mike Brant
  • 70,514
  • 10
  • 99
  • 103
1

It's a regular expression:

  • / /g = Global replace, i.e. replace all occurrences in the text string
  • & = matches ampersands in the text, because it isn't a reserved character in JS Regex
  • (?!) = a "negative lookahead" search (not to be confused with the independent use of ? for a non-greedy search.
  • \s = matches any whitespace character
  • |< = in case it didn't match an ampersand, it will try to catch an opening angle-brace

It will then capture the string and pass it into the callback function and replace the & with &amp; and < with &lt;. This works as a basic way to HTML-encode a string, however it isn't robust in my opinion.

SGML (and its derivatives: HTML and XML) prefer that syntactical characters be completely encoded, so every occurrence of an amperstand, opening and closing bracket, and quotes should be encoded, whereas the function you provided only does two of those things (granted, those 2 are the most important).

I recommend reading this entry: HTML-encoding lost when attribute read from input field

Community
  • 1
  • 1
Dai
  • 141,631
  • 28
  • 261
  • 374