210

Apparently, this is harder to find than I thought it would be. And it even is so simple...

Is there a function equivalent to PHP's htmlspecialchars built into JavaScript? I know it's fairly easy to implement that yourself, but using a built-in function, if available, is just nicer.

For those unfamiliar with PHP, htmlspecialchars translates stuff like <htmltag/> into &lt;htmltag/&gt;

I know that escape() and encodeURI() do not work this way.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Bart van Heukelom
  • 43,244
  • 59
  • 186
  • 301
  • 1
    php has got some really good tools, var_dump, print_r, htmlspecialchars etc. Unfortunately I suspect is not the same with js. js alert is so poor. A fast way to see that some unexpected (and invisible in alert box) string is coming, is to alert the string length instead of the string itslef. – Melsi Apr 26 '14 at 01:45
  • Possible duplicate of [Escaping HTML strings with jQuery](http://stackoverflow.com/questions/24816/escaping-html-strings-with-jquery) – nhahtdh Nov 30 '15 at 06:16
  • 1
    See https://stackoverflow.com/a/12034334/8804293, it has a great answer – Elijah Mock May 06 '20 at 16:39

18 Answers18

379

There is a problem with your solution code--it will only escape the first occurrence of each special character. For example:

escapeHtml('Kip\'s <b>evil</b> "test" code\'s here');
Actual:   Kip&#039;s &lt;b&gt;evil</b> &quot;test" code's here
Expected: Kip&#039;s &lt;b&gt;evil&lt;/b&gt; &quot;test&quot; code&#039;s here

Here is code that works properly:

function escapeHtml(text) {
  return text
      .replace(/&/g, "&amp;")
      .replace(/</g, "&lt;")
      .replace(/>/g, "&gt;")
      .replace(/"/g, "&quot;")
      .replace(/'/g, "&#039;");
}

Update

The following code will produce identical results to the above, but it performs better, particularly on large blocks of text (thanks jbo5112).

function escapeHtml(text) {
  var map = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    '"': '&quot;',
    "'": '&#039;'
  };
  
  return text.replace(/[&<>"']/g, function(m) { return map[m]; });
}
Community
  • 1
  • 1
Kip
  • 107,154
  • 87
  • 232
  • 265
  • 5
    nice thing about this function is that it works in node.js which doesn't have a dom by default – booyaa Feb 01 '13 at 10:46
  • 7
    It's faster to use a single replace and mapping function, and the single replace scales much better. (http://jsperf.com/escape-html-special-chars/11) – jbo5112 May 10 '14 at 18:07
  • 1
    @jbo5112 good point, I didn't realize JS allowed callbacks for replacement. This code is easier to understand though, and I doubt that shaving a few milliseconds off of escapeHtml() is going to make a difference unless you are calling it hundreds of times in a row for some reason. – Kip May 21 '14 at 21:28
  • This will distort URLs in text which makes them unusable for plugins like [Autolinker.js](https://github.com/gregjacobs/Autolinker.js/). Is there any way how to approach this? – Radek Matěj Feb 15 '17 at 10:03
  • @RadekMatěj Not sure without an example, but that sounds like a bug you should report to the creator of the plugin. The only one of those characters that should be in a URL is `&`. Is it possible the `&` is already encoded as `&`, and this code encodes it again to `&amp;`, so that `&` is what is actually shown on screen? – Kip Feb 16 '17 at 14:24
  • @Kip If user input is `Hey, look at g.com?a=1&b=2 & tell me what do you think.`, only the second `&` has to be encoded. I suppose it should be responsibility of the plugin to encode everything **except** URLs because they posses the code that identifies URL. – Radek Matěj Feb 18 '17 at 09:10
  • 4
    @RadekMatěj Even in that case it is perfectly valid (preferable I would argue) for both ampersands to be encoded when used in an HTML document. I would still consider it a bug with the plugin. – Kip Feb 20 '17 at 19:48
  • I had an issue with ".replace(/'/g, "'");", browser was converting it back to apostrophe in the inline javascript. Workaround was to add escape characther: .replace(/'/g, "\\'"); – soulrider Sep 28 '18 at 18:53
  • 1
    The map variable should be declared above the function so that it isn't re-created each time, it will perform even better. Here's a test: https://jsperf.com/compare-var-inside-and-outside-escapehtml-function/1 Here's another test using inline as well: https://jsperf.com/compare-escape-var-inside-inline/1 – ADJenks Feb 26 '19 at 20:35
37

That's HTML Encoding. There's no native javascript function to do that, but you can google and get some nicely done up ones.

E.g. http://sanzon.wordpress.com/2008/05/01/neat-little-html-encoding-trick-in-javascript/

EDIT:
This is what I've tested:

var div = document.createElement('div');
  var text = document.createTextNode('<htmltag/>');
  div.appendChild(text);
  console.log(div.innerHTML);

Output: &lt;htmltag/&gt;

o.k.w
  • 25,490
  • 6
  • 66
  • 63
  • Too bad, I'll just have to use a custom function then. – Bart van Heukelom Nov 24 '09 at 02:06
  • You can try the method in the link I've included in my post. Pretty neat concept indeed. – o.k.w Nov 24 '09 at 02:10
  • @o.k.w: Ok, first you linked to this: http://www.yuki-onna.co.uk/html/encode.html which does exactly what `encodeURIComponent` does and not at all what the OP asked. So can you edit please? I can't seem to undo my -1. – Crescent Fresh Nov 24 '09 at 02:14
  • Yah, that page's code looks logical but I didn't test it out. The new link though works, I've verified it myself. I've already updated the post some time back. – o.k.w Nov 24 '09 at 02:19
  • @BeauCielBleu: No. The only nodes that are created are a single `div` element and a text node. Creating a text node with text ` ` will just create a text node, not an `img` element. – Tim Down May 24 '15 at 10:25
  • 1
    Simpler version: `var div = document.createElement('div'); div.textContent = ''; console.log(div.innerHTML);` – hakatashi Nov 28 '16 at 04:27
  • That's a clever solution! – RedGuy11 Jan 27 '21 at 16:04
31

Worth a read: http://bigdingus.com/2007/12/29/html-escaping-in-javascript/

escapeHTML: (function() {
 var MAP = {
   '&': '&amp;',
   '<': '&lt;',
   '>': '&gt;',
   '"': '&#34;',
   "'": '&#39;'
 };
  var repl = function(c) { return MAP[c]; };
  return function(s) {
    return s.replace(/[&<>'"]/g, repl);
  };
})()

Note: Only run this once. And don't run it on already encoded strings e.g. &amp; becomes &amp;amp;

Platinum Azure
  • 45,269
  • 12
  • 110
  • 134
Chris Jacob
  • 11,878
  • 7
  • 47
  • 42
  • 3
    This should be the accepted and highest voted answer. I'm not sure why it had no votes. This is benchmarking as the fastest with both a long (326KB Google search result) and short input string on jsperf (http://jsperf.com/escape-html-special-chars/11). Please vote this up. – jbo5112 May 10 '14 at 18:05
  • What is the difference between this one the the answer that got the highest votes?. Why the additional inner function?. An explanation could help the users understand better – Kosem May 05 '20 at 01:27
26

Here's a function to escape HTML:

function escapeHtml(str)
{
    var map =
    {
        '&': '&amp;',
        '<': '&lt;',
        '>': '&gt;',
        '"': '&quot;',
        "'": '&#039;'
    };
    return str.replace(/[&<>"']/g, function(m) {return map[m];});
}

And to decode:

function decodeHtml(str)
{
    var map =
    {
        '&amp;': '&',
        '&lt;': '<',
        '&gt;': '>',
        '&quot;': '"',
        '&#039;': "'"
    };
    return str.replace(/&amp;|&lt;|&gt;|&quot;|&#039;/g, function(m) {return map[m];});
}
Dan Bray
  • 7,242
  • 3
  • 52
  • 70
18

With jQuery it can be like this:

var escapedValue = $('<div/>').text(value).html();

From related question Escaping HTML strings with jQuery

As mentioned in comment double quotes and single quotes are left as-is for this implementation. That means this solution should not be used if you need to make element attribute as a raw html string.

Community
  • 1
  • 1
Oleksandr Yanovets
  • 4,661
  • 4
  • 31
  • 26
  • 2
    any idea if there is any overhead to this--adding a dummy object to the DOM? – Kip Jan 29 '11 at 05:39
  • and are there any other advantages (say, if you have unicode characters or something)? – Kip Jan 29 '11 at 05:44
  • It definetly costs something, but for the most tasks these days it will be OK. For the long chunk of text this could be even faster (depends on DOM function implementation detals) because replace runs 5 times. Advantages over separate custom function with replacements - this solution uses library code - that just feels safer :) While "replace" version is more accurate implementation of htmlspecialchars. – Oleksandr Yanovets Feb 04 '11 at 14:04
  • 4
    Something I found with this: double quotes and single quotes are left as-is. That makes this problematic if you want to use it in an attribute value. – Kip Jun 16 '11 at 19:15
  • 1
    For small chunks of text, this takes 30x as long as running all of the replaces. It does scale better though. With something as gigantic as a Google search result page (326KB), it's 25-30% faster than the replaces or doing this in straight javascript. However, they all consistently lose to a single replace and a mapping function. – jbo5112 May 10 '14 at 18:03
  • 9
    how people vote on this answer: answer has jquery:+1 - does NOT escape single and double quotes: ummmm..(scratching head).. +1. `` **This answer should had NEGATIVE score since it DOES NOT EVEN COME CLOSE TO ANSWER THE QUESTION "HtmlSpecialChars equivalent".** `` it-does-not-escape-quotes-jesus-christ-and-other-deities. **OMG** you jquery people. – Sharky May 31 '14 at 08:57
  • I Prefer this version for supporting % and + too: $('
    ').text(text).html().replace(/%/g, '%25').replace(/\+/g, '%2B');
    – Dudi Nov 19 '15 at 12:05
8

Underscore.js provides a function for this:

_.escape(string)

Escapes a string for insertion into HTML, replacing &, <, >, ", and ' characters.

http://underscorejs.org/#escape

It's not a built-in JavaScript function, but if you are already using Underscore.js, it is a better alternative than writing your own function if your strings to convert are not too large.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
mer10z_tech
  • 697
  • 7
  • 12
7

Yet another take at this is to forgo all the character mapping altogether and to instead convert all unwanted characters into their respective numeric character references, e.g.:

function escapeHtml(raw) {
    return raw.replace(/[&<>"']/g, function onReplace(match) {
        return '&#' + match.charCodeAt(0) + ';';
    });
}

Note that the specified RegEx only handles the specific characters that the OP wanted to escape but, depending on the context that the escaped HTML is going to be used, these characters may not be sufficient. Ryan Grove’s article There's more to HTML escaping than &, <, >, and " is a good read on the topic. And depending on your context, the following RegEx may very well be needed in order to avoid XSS injection:

var regex = /[&<>"'` !@$%()=+{}[\]]/g
Fredric
  • 1,223
  • 18
  • 16
4

Use:

String.prototype.escapeHTML = function() {
        return this.replace(/&/g, "&amp;")
                   .replace(/</g, "&lt;")
                   .replace(/>/g, "&gt;")
                   .replace(/"/g, "&quot;")
                   .replace(/'/g, "&#039;");
    }

Sample:

var toto = "test<br>";
alert(toto.escapeHTML());
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
patrick
  • 57
  • 1
  • 1
  • An explanation would be in order. For example, why is it exactly those five characters? What is the logic behind the selection of them? Please respond by [editing your answer](https://stackoverflow.com/posts/22527356/edit), not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Jun 15 '21 at 21:57
  • ASCII is ranged from 32 to 126 according to https://theasciicode.com.ar/ . Only these 5 characters &<>"' will get encoded. – John Wong Aug 02 '22 at 04:42
4

Chances are you don't need such a function. Since your code is already in the browser*, you can access the DOM directly instead of generating and encoding HTML that will have to be decoded backwards by the browser to be actually used.

Use innerText property to insert plain text into the DOM safely and much faster than using any of the presented escape functions. Even faster than assigning a static preencoded string to innerHTML.

Use classList to edit classes, dataset to set data- attributes and setAttribute for others.

All of these will handle escaping for you. More precisely, no escaping is needed and no encoding will be performed underneath**, since you are working around HTML, the textual representation of DOM.

// use existing element
var author = 'John "Superman" Doe <john@example.com>';
var el = document.getElementById('first');
el.dataset.author = author;
el.textContent = 'Author: '+author;

// or create a new element
var a = document.createElement('a');
a.classList.add('important');
a.href = '/search?q=term+"exact"&n=50';
a.textContent = 'Search for "exact" term';
document.body.appendChild(a);

// actual HTML code
console.log(el.outerHTML);
console.log(a.outerHTML);
.important { color: red; }
<div id="first"></div>

* This answer is not intended for server-side JavaScript users (Node.js, etc.)

** Unless you explicitly convert it to actual HTML afterwards. E.g. by accessing innerHTML - this is what happens when you run $('<div/>').text(value).html(); suggested in other answers. So if your final goal is to insert some data into the document, by doing it this way you'll be doing the work twice. Also you can see that in the resulting HTML not everything is encoded, only the minimum that is needed for it to be valid. It is done context-dependently, that's why this jQuery method doesn't encode quotes and therefore should not be used as a general purpose escaper. Quotes escaping is needed when you're constructing HTML as a string with untrusted or quote-containing data at the place of an attribute's value. If you use the DOM API, you don't have to care about escaping at all.

user
  • 23,260
  • 9
  • 113
  • 101
  • Thanks for this! I've spent way to long looking for such a simple solution. One important thing I've discovered is that if your text contains newlines, then you will have to either replace them with HTML line breaks (something like `el.textContent = str; el.innerHTML = el.innerHTML.replace(/\n/g, '
    ')`), or set the CSS `white-space` property to `pre` or `pre-wrap`
    – stellatedHexahedron Apr 20 '18 at 19:15
  • @stellatedHexahedron, thanks for raising up this issue. I've changed my answer to recommend `innerText` instead of `textContent`. While a bit slower and has some [other differences](https://developer.mozilla.org/en-US/docs/Web/API/Node/textContent#Differences_from_innerText) when reading the property, it's more intuitive in that it does the `
    ` replacement automatically when assigning to it.
    – user Apr 25 '18 at 18:00
4

By the books

OWASP recommends that "[e]xcept for alphanumeric characters, [you should] escape all characters with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of [an] attribute."

So here's a function that does that, with a usage example:

function escapeHTML(unsafe) {
  return unsafe.replace(
    /[\u0000-\u002F\u003A-\u0040\u005B-\u0060\u007B-\u00FF]/g,
    c => '&#' + ('000' + c.charCodeAt(0)).slice(-4) + ';'
  )
}

document.querySelector('div').innerHTML =
  '<span class=' +
  escapeHTML('"fakeclass" onclick="alert("test")') +
  '>' +
  escapeHTML('<script>alert("inspect the attributes")\u003C/script>') +
  '</span>'
<div></div>

You should verify the entity ranges I have provided to validate the safety of the function yourself. You could also use this regular expression which has better readability and should cover the same character codes, but is about 10% less performant in my browser:

/(?![0-9A-Za-z])[\u0000-\u00FF]/g

ADJenks
  • 2,973
  • 27
  • 38
2
function htmlEscape(str){
    return str.replace(/[&<>'"]/g,x=>'&#'+x.charCodeAt(0)+';')
}

This solution uses the numerical code of the characters, for example < is replaced by &#60;.

Although its performance is slightly worse than the solution using a map, it has the advantages:

  • Not dependent on a library or DOM
  • Pretty easy to remember (you don't need to memorize the 5 HTML escape characters)
  • Little code
  • Reasonably fast (it's still faster than 5 chained replace)
user202729
  • 3,358
  • 3
  • 25
  • 36
2

I am elaborating a bit on o.k.w.'s answer.

You can use the browser's DOM functions for that.

var utils = {
    dummy: document.createElement('div'),
    escapeHTML: function(s) {
        this.dummy.textContent = s
        return this.dummy.innerHTML
    }
}

utils.escapeHTML('<escapeThis>&')

This returns &lt;escapeThis&gt;&amp;

It uses the standard function createElement to create an invisible element, then uses the function textContent to set any string as its content and then innerHTML to get the content in its HTML representation.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jonas Eberle
  • 2,835
  • 1
  • 15
  • 25
2

// Codificamos los caracteres: &, <, >, ", '
function encodeHtml(str) {

  var map = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    '"': '&quot;',
    "'": '&#039;'
  };

  return str.replace(/[&<>"']/g, function(m) {return map[m];});
}

// Decodificamos los caracteres: &amp; &lt; &gt; &quot; &#039;
function decodeHtml(str) {

  var map = {
    '&amp;': '&',
    '&lt;': '<',
    '&gt;': '>',
    '&quot;': '"',
    '&#039;': "'"
  };

  return str.replace(/&amp;|&lt;|&gt;|&quot;|&#039;/g, function(m) {return map[m];});
}

var str = `atttt ++ ' ' " " " " " + {}-´ñ+.'aAAAaaaa"`;

var str2 = `atttt ++ &#039; &#039; &quot; &quot; &quot; &quot; &quot; + {}-´ñ+.&#039;aAAAaaaa&quot;`;


console.log(encodeHtml(str));
console.log(decodeHtml(str2));
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div class="string">
<b>- String de entrada</b>: atttt ++ ' ' " " " " " + {}-´ñ+.'aAAAaaaa"  
<br> 
- mira la consola 
</div>
1

For Node.js users (or users using the Jade runtime in the browser), you can use Jade's escape function.

require('jade').runtime.escape(...);

There isn't any sense in writing it yourself if someone else is maintaining it. :)

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
BMiner
  • 16,669
  • 12
  • 53
  • 53
0
function htmlspecialchars(str) {
 if (typeof(str) == "string") {
  str = str.replace(/&/g, "&amp;"); /* must do &amp; first */
  str = str.replace(/"/g, "&quot;");
  str = str.replace(/'/g, "&#039;");
  str = str.replace(/</g, "&lt;");
  str = str.replace(/>/g, "&gt;");
  }
 return str;
 }
0

I hope this wins the race due to its performance and most important not a chained logic using .replace('&','&').replace('<','<')...

var mapObj = {
   '&':  "&amp;",
   '<':  "&lt;",
   '>':  "&gt;",
   '"':  "&quot;",
   '\'': "&#039;"
};
var re = new RegExp(Object.keys(mapObj).join("|"), "gi");

function escapeHtml(str)
{
    return str.replace(re, function(matched)
    {
        return mapObj[matched.toLowerCase()];
    });
}

console.log('<script type="text/javascript">alert('Hello World');</script>');
console.log(escapeHtml('<script type="text/javascript">alert('Hello World');</script>'));
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Airy
  • 5,484
  • 7
  • 53
  • 78
0

This isn't directly related to this question, but the reverse could be accomplished in JS through:

> String.fromCharCode(8212);
> "—"

That also works with TypeScript.

Philippe Fanaro
  • 6,148
  • 6
  • 38
  • 76
-1

Reversed one:

function decodeHtml(text) {
    return text
        .replace(/&amp;/g, '&')
        .replace(/&lt;/ , '<')
        .replace(/&gt;/, '>')
        .replace(/&quot;/g,'"')
        .replace(/&#039;/g,"'");
}
rocambille
  • 15,398
  • 12
  • 50
  • 68
Gleb Dolzikov
  • 776
  • 8
  • 13
  • 1
    The question isn't asking how to decode entities. This does the opposite of what the question is asking for. – Quentin Jan 13 '17 at 12:47
  • 1
    This will only replace the **first** instances of `<` and `&gr;` in a string. – Quentin Jan 13 '17 at 12:47
  • 1
    This will only decode the five characters which (outside of non-Unicode documents) *must* be escaped, it won't decode ones which *may* be escaped. – Quentin Jan 13 '17 at 12:48
  • 1
    This doesn't take into account the rules for when the semi-colon is optional. – Quentin Jan 13 '17 at 12:48
  • 1
    If the HTML says: `To write a greater than sign in HTML type &gt;`, it will incorrectly display `>` instead of `>` – Quentin Jan 13 '17 at 12:49