HTML Entity Decode

Question

How do I encode and decode HTML entities using JavaScript or JQuery?

var varTitle = "Chris&apos; corner";

I want it to be:

var varTitle = "Chris' corner";

See [this answer](http://stackoverflow.com/questions/7394748/whats-the-right-way-to-decode-a-string-that-has-special-html-entities-in-it?lq=1). Seems better than what is offered below. — Déjà vu, Feb 12 '14 at 09:00
Also see the ent module (on npm!) https://github.com/substack/node-ent — TehShrike, Oct 06 '15 at 00:54
I think @ringø (wow, oddly similar username...) meant to link to [this answer](http://stackoverflow.com/a/7394787/114558) — rinogo, Oct 05 '16 at 21:35
@rinogo I thought [this was the better answer](http://stackoverflow.com/a/35915311/441930). apparently the [he](https://www.npmjs.com/package/he) lib is designed for exactly this purpose. You might be able to save a few lines of code with a custom implementation like most of the answers here, but they all have limitations one way or another. — Mr5o1, Mar 26 '17 at 23:12
A more concise way: https://stackoverflow.com/a/64587244/9854149 — weiya ou, Oct 29 '20 at 08:14

score 313 · Answer 1 · edited May 23 '17 at 11:54

313

I recommend against using the jQuery code that was accepted as the answer. While it does not insert the string to decode into the page, it does cause things such as scripts and HTML elements to get created. This is way more code than we need. Instead, I suggest using a safer, more optimized function.

var decodeEntities = (function() {
  // this prevents any overhead from creating the object each time
  var element = document.createElement('div');

  function decodeHTMLEntities (str) {
    if(str && typeof str === 'string') {
      // strip script/html tags
      str = str.replace(/<script[^>]*>([\S\s]*?)<\/script>/gmi, '');
      str = str.replace(/<\/?\w(?:[^"'>]|"[^"]*"|'[^']*')*>/gmi, '');
      element.innerHTML = str;
      str = element.textContent;
      element.textContent = '';
    }

    return str;
  }

  return decodeHTMLEntities;
})();

http://jsfiddle.net/LYteC/4/

To use this function, just call decodeEntities("&") and it will use the same underlying techniques as the jQuery version will—but without jQuery's overhead, and after sanitizing the HTML tags in the input. See Mike Samuel's comment on the accepted answer for how to filter out HTML tags.

This function can be easily used as a jQuery plugin by adding the following line in your project.

jQuery.decodeEntities = decodeEntities;

edited May 23 '17 at 11:54

Community

1
1

answered Mar 07 '12 at 21:36

Robert K

30,064
12
61
79

Can someone tell me what str.replace(/<\/?\w(?:[^"'>]|"[^"]*"|'[^']*')*>/gmi, ''); does? – PoeHaH Jan 17 '13 at 16:55
@PoeHaH It strips out all html tags, both opening and closing. – Robert K Jan 17 '13 at 18:44
9

Note: textContent is not supported in IE8, so if that's still one of your targeted browsers, you have to find another solution. I just wasted an hour trying to figure that out, since we need to decode entities specifically to compensate for another IE8 bug. – Greg Charles Sep 10 '13 at 21:57
@GregCharles I don't know of any good alternatives for IE8 and lower. There's no particularly convenient way to get the text content of a node without textContent. – Robert K Sep 12 '13 at 13:54
@RobertK -- I was able to do it with: jQuery('
').html(str).text();, so the jQuery folks figured out a way. I have jQuery on the page anyway, but if it really needed to be done without it, you could step into the code and see how they did it. – Greg Charles Sep 16 '13 at 20:42
@GregCharles -- Using the `jQuery('
').html(str).text();` method is not safe. You can include JavaScript in the string which will be run. That's why this function strips out HTML and JavaScript. – Gavin Feb 19 '14 at 18:28
Doesn't IE allow for `innerText` in lieu of `textContent` though? – David Thomas Feb 24 '14 at 22:26
According to Quirksmode, IE does allow for `innerText`. I'll play around with it a little later and verify that before I update my answer. – Robert K Feb 25 '14 at 14:04
6

Careful with the line that takes out HTML tags. You shouldn't be using regex with HTML/XML. Bobince has made this clear for ages. – Qix - MONICA WAS MISTREATED Feb 12 '15 at 23:42
@EruPenkman I don't understand why you posted that. What's your point? – Robert K Jul 08 '15 at 12:51
is there a reason why you chose to create a block element (div) instead of an inline one? ... Asking as someone who successfully wanted HTML entities in a Javascript-driven placeholder and used a instead of a div to avoid any height:auto related issues... – vzR Sep 15 '17 at 13:48
@vzR Returning node.textContent is similar to node.innerText, and won't contain the node's tag. Therefore, changing the element you write to has no effect on the returned result. – Robert K Sep 15 '17 at 19:40
1

While this is nice, users should be aware that it is pretty dangerous. It may appear that it properly strips "dangerous stuff" out, but it can be easily defeated. Do not use this on untrusted user input unless you like getting xss attacked. – goat Dec 06 '17 at 23:14
3

@Qix I don't completely understand the problem here. HTML/XML should certainly not be "parsed with regexes" as people so often do. If all you're trying to do is tokenize it, then AFAIK regexes are exactly an ideal solution. Unless I'm missing something, stripping the tags completely shouldn't require anything beyond lexical analysis and thus there'd be no benefit to going beyond regexes here. – Darren Ringer May 09 '18 at 18:40
I greatly simplified this code, see [my answer](https://stackoverflow.com/a/55142351/5286034). – Илья Зеленько Mar 13 '19 at 12:56
can someone tell me why this code doesn't work nicely with ```'```? It always returns ```\'```, putting a random slash infront. I tried ```"``` and the code returned ```"``` jsut fine. Here's the string I'm using: ```"Silhouette", a song performed by the group 'KANA-BOON' is featured as the sixteenth opening of which anime?``` – Long Vuong Jan 23 '22 at 03:29
**CAUTION** : The regular expression method to remove `` will pass just because of the space after the opening `<` char. At a minimum, I would use `/<\s*script[^>]*>(.*?)<\s*\/\s*script\s*>/gis` to be a bit safer. But I wouldn't be surprised if someone manages to trick it. @Qix-MONICAWASMISTREATED is totally right, it's dangerous to try and solve this with a regular expression. Same problem with your second regex: `< img onload="alert('xss')" href="/wrong-url" />` passes without any problem :-( – Patrick Janser Aug 07 '23 at 13:42
This answer should be edited to mention the security risks it has. As [mentioned clearly here](https://stackoverflow.com/questions/1147359/how-to-decode-html-entities-using-jquery/1395954#1395954), we should never create a `
` element to solve the problem as it will execute some JS. It's not the case if we create a `` element.
– Patrick Janser Aug 07 '23 at 13:53

score 260 · Accepted Answer · edited Feb 14 '17 at 20:05

260

You could try something like:

var Title = $('<textarea />').html("Chris&apos; corner").text();
console.log(Title);

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

JS Fiddle.

A more interactive version:

$('form').submit(function() {
  var theString = $('#string').val();
  var varTitle = $('<textarea />').html(theString).text();
  $('#output').text(varTitle);
  return false;
});

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<form action="#" method="post">
  <fieldset>
    <label for="string">Enter a html-encoded string to decode</label>
    <input type="text" name="string" id="string" />
  </fieldset>
  <fieldset>
    <input type="submit" value="decode" />
  </fieldset>
</form>

<div id="output"></div>

JS Fiddle.

edited Feb 14 '17 at 20:05

H. Pauwelyn

13,575
26
81
144

answered Apr 26 '11 at 21:35

David Thomas

249,100
51
377
410

Cool that works. So just curious, the $('div />') is used to create a
element around the varTitle?
– chris Apr 26 '11 at 21:38
6

@chris and @david - This code creates an empty (detached from DOM) div and sets it's innerHTML and finally retrieved back as normal text. It's not *surrounding it with a DIV*, but *putting it in a div*. I putting some emphasis over this since it's crucial to understand how jQuery works. – Christian Apr 26 '11 at 21:56
41

Do NOT use this with untrusted data, see Mike's comment here: http://stackoverflow.com/questions/1147359/how-to-decode-html-entities-using-jquery#comment6018122_2419664 – Samuel Katz Jun 24 '12 at 02:57
its good but doesn't allways work it works within a div tag but for example if you type in some php or css without the html it lets it through – Paul Ledger Dec 10 '13 at 23:53
5

just chiming in. this is vulnerable to xss attacks, try them! https://stackoverflow.com/questions/31282274/exploiting-jquery-html-encoding-xss – actual_kangaroo Jul 08 '15 at 02:10
How we can use this for Titanium Appcelerator? – MobileGeek Jul 31 '15 at 12:27
`$('').html("Chris' corner").text();` will this attach a `textarea` to document? – Manish Kumar Feb 03 '17 at 13:18
it is jQuery only... – insign Sep 24 '17 at 14:37
I'ts cool. But, please - compare execute results `jQuery('').html('hi').text()` and `jQuery('').html('hi').text()` : it is underwater stones – Alexander Goncharov Oct 08 '19 at 17:46
2

This can be vulnerable to XSS attacks for older jQuery versions ([see more here](https://stackoverflow.com/a/1395954/6476044)). I would suggest using [he library](https://github.com/mathiasbynens/he) instead. You can see code examples in another [answer to similar question](https://stackoverflow.com/a/23596964/6476044). – ands Oct 24 '19 at 20:30

score 132 · Answer 3 · edited Dec 23 '14 at 23:36

132

Like Robert K said, don't use jQuery.html().text() to decode html entities as it's unsafe because user input should never have access to the DOM. Read about XSS for why this is unsafe.

Instead try the Underscore.js utility-belt library which comes with escape and unescape methods:

_.escape(string)

Escapes a string for insertion into HTML, replacing &, <, >, ", `, and ' characters.

_.escape('Curly, Larry & Moe');
=> "Curly, Larry &amp; Moe"

_.unescape(string)

The opposite of escape, replaces &, <, >, ", ` and ' with their unescaped counterparts.

_.unescape('Curly, Larry &amp; Moe');
=> "Curly, Larry & Moe"

To support decoding more characters, just copy the Underscore unescape method and add more characters to the map.

edited Dec 23 '14 at 23:36

kevinAlbs

1,114
2
11
20

answered Jul 30 '12 at 04:27

Alan Hamlett

3,160
1
23
23

TypeError: _.unescape is not a function – chovy Oct 21 '12 at 08:52
2

@chovy, use the latest Underscore.js version >= 1.4.2 and you won't get a TypeError. – Alan Hamlett Oct 21 '12 at 23:01
This is basically the only that works for me. I need to find script tags containing templates, and then search through those to find subsections of those templates. When doing this with jQuery, jQ converts all "invalid" html (aka the template tags) to entities. In order to retrieve the subsections, they need to be unescaped again, and for that, none of the other answers work. – oligofren Jun 14 '13 at 08:29
3

I like this answer because it doesn't require a DOM, and nowadays who can guarantee access to the DOM API when writing javascript? Unfortunately it only works for the listed entities, and leaves things like untouched. – trey-jones Aug 20 '14 at 14:06
I like the spirit of this answer. Find a library that does it, use it, or go their source, and copy the logic. – David Gilbertson Oct 03 '14 at 10:19
1

+1 for using a source-controlled library rather than copying and pasting some random code from the top stack overflow answer. If only the javascript standard library had these kind of low-level functions. – Michael Bylstra Oct 22 '14 at 06:57
5

Keep in mind that it does not unencode encoded russian or japanese characters. e.g. ハローワールド -> ハローワールド cannot be done with this – daghan Nov 11 '14 at 12:48
7

`_.unescape` only works for [a handful of values](http://underscorejs.org/docs/underscore.html#section-160). So something like `_.unescape('»')` for example will just return `"»"` – Dylan Nov 23 '15 at 17:53
1

this doesn't work with lodash – chovy Nov 25 '19 at 05:32

insign · Answer 4 · 2020-05-02T03:15:10.487

105

Original author answer here.

This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.

function decodeHtml(html) {
    var txt = document.createElement("textarea");
    txt.innerHTML = html;
    return txt.value;
}

Example: http://jsfiddle.net/k65s3/

Input:

Entity:&nbsp;Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Output:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

edited May 02 '20 at 03:15

answered Feb 11 '17 at 23:07

insign

5,353
1
38
35

2

This method works every where even when jquery is not available or not loaded yet, because its pure javascript. – IamSalik Apr 03 '18 at 07:57
2

Is there any drawback to this technique? It seems way easier than the answers above. – anthonygood Dec 09 '18 at 20:00
@anthonygood I don't think so. – insign Dec 10 '18 at 03:36
1

@anthonygood each time the function creates a new object (DOM element) – Илья Зеленько Mar 13 '19 at 12:18
1

You could wrap this in an immediately-invoked function expression so that the DOM element is only created once: `const decodeHTMLEntities = (() => { const textArea = document.createElement('textarea'); return (message: string): string => { textArea.innerHTML = message; return textArea.value; }; })();` – jessepinho May 01 '19 at 18:18
4

Next time @insign please credit the original author or give a link to it. https://stackoverflow.com/a/7394787 – geauser May 21 '19 at 16:27
1

@geauser yes, done – insign May 02 '20 at 03:15
1

This is the best and most useful answer in the whole list. Thanks @insign. – Samagra Singh Tomar Mar 26 '21 at 05:33
This is the reason people hate JS :-D – Lizozom Feb 28 '23 at 18:08

score 59 · Answer 5 · edited May 23 '23 at 12:38

59

Here's a quick method that doesn't require creating a div, and decodes the "most common" HTML escaped chars:

function decodeHTMLEntities(text) {
    var entities = [
        ['amp', '&'],
        ['apos', '\''],
        ['#x27', '\''],
        ['#x2F', '/'],
        ['#39', '\''],
        ['#47', '/'],
        ['lt', '<'],
        ['gt', '>'],
        ['nbsp', ' '],
        ['quot', '"']
    ];

    for (var i = 0, max = entities.length; i < max; ++i)
        text = text.replace(new RegExp('&' + entities[i][0] + ';', 'g'), entities[i][1]);

    return text;
}

console.log(decodeHTMLEntities('&amp; &quot;'));

edited May 23 '23 at 12:38

AndyDaSilva52

115
3
19

answered Jun 12 '13 at 22:01

William Lahti

1,150
11
11

15

Your answer doesn't work at all for most html entities, and expanding it to include them would be pretty repetitive and error-prone. E.g., there's an entity for each Japanese kanji character, of which there are thousands. Plus by that point, I wouldn't be surprised if your answer was slower than some of the others here, since you'd be running thousands of replaces with thousands of regexes for each string to decode. – mmitchell Aug 21 '13 at 21:54
2

It really depends on your PURPOSE when you are encoding these strings. If your goal is to have it not trigger HTML processing via things like < or > it is entirely unnecessary to encode the other characters via the character entity syntax. The extensive amount of character entities serve mostly as a convenience tool. The entities I have listed are the bare minimum of ones you must escape to avoid having the data get mixed up with HTML. [Continued in next comment] – William Lahti Aug 23 '13 at 15:34
1

As for the speed thing, good point on having run multiple regexes. But of course since your idea of putting every character entity into that code is pointless and frankly, really stupid, this is not an issue. One could however generate the regex using the | character first and do a single replace() call. I think you'd have to benchmark it to see which is faster, but my gut says it'll be faster to use | with one replace() due to function call overhead being high in Javascript. – William Lahti Aug 23 '13 at 15:34
1

Right, so your solution is incomplete. The OP never said why they were encoding their HTML entities so if you were making an assumption on that front, it probably should have been noted in the answer. – mmitchell Aug 23 '13 at 16:32
1

This is complete when you're trying to replicate htmlspecialchars_decode in javascript. It does not pretend to replicate html_entity_decode. I find there's alot of noise on this topic and many bloated/insecure methods. This is the decode companion to the excellent encode solutions provided by Kip and Chris Jacob: http://stackoverflow.com/questions/1787322/htmlspecialchars-equivalent-in-javascript – MichaelClark Jan 31 '16 at 00:36
1

I was working with decoding reddit comments for a Vue.js app v-html directive, and it worked great. Thank you! – Max Pekarsky Nov 09 '18 at 22:41
You should also replace ↵ with blank spaces. – Renan Coelho Jun 28 '20 at 18:54
for my google app script. I've use this because neither `Document` nor the any DOM manipulation is not available in google app scripts. thanks – ashen madusanka Apr 01 '21 at 05:54

score 37 · Answer 6 · answered Apr 07 '17 at 15:33

37

here is another version:

function convertHTMLEntity(text){
    const span = document.createElement('span');

    return text
    .replace(/&[#A-Za-z0-9]+;/gi, (entity,position,text)=> {
        span.innerHTML = entity;
        return span.innerText;
    });
}

console.log(convertHTMLEntity('Large &lt; &#163; 500'));

answered Apr 07 '17 at 15:33

Mirodil

2,321
2
30
38

Since you are matching both `A-Z` and `a-z`, is the case insensitive option needed ? – tigrou Jul 16 '19 at 14:42
@tigrou No, You can remove that option. – Mirodil Jul 16 '19 at 18:06
this is the best version thanks ! – Karambit May 07 '21 at 18:57
This is the best answer. I will upvote it. – Piyush May 09 '22 at 13:58
Excellent solution, but still not sure how its working. Please explain your code. – MosesK May 24 '22 at 07:49

score 22 · Answer 7 · answered Oct 26 '12 at 17:02

Inspired by Robert K's solution, this version does not strip HTML tags, and is just as secure.

var decode_entities = (function() {
    // Remove HTML Entities
    var element = document.createElement('div');

    function decode_HTML_entities (str) {

        if(str && typeof str === 'string') {

            // Escape HTML before decoding for HTML Entities
            str = escape(str).replace(/%26/g,'&').replace(/%23/g,'#').replace(/%3B/g,';');

            element.innerHTML = str;
            if(element.innerText){
                str = element.innerText;
                element.innerText = '';
            }else{
                // Firefox support
                str = element.textContent;
                element.textContent = '';
            }
        }
        return unescape(str);
    }
    return decode_HTML_entities;
})();

Those `escape()` and `unescape()` functions are deprecated. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/escape — Ben Creasy, Mar 07 '16 at 22:14

score 16 · Answer 8 · answered Feb 16 '15 at 14:24

16

jQuery provides a way to encode and decode html entities.

If you use a "<div/>" tag, it will strip out all the html.

function htmlDecode(value) {
    return $("<div/>").html(value).text();
}

function htmlEncode(value) {
    return $('<div/>').text(value).html();
}

If you use a "<textarea/>" tag, it will preserve the html tags.

function htmlDecode(value) {
    return $("<textarea/>").html(value).text();
}

function htmlEncode(value) {
    return $('<textarea/>').text(value).html();
}

answered Feb 16 '15 at 14:24

Jason Williams

2,740
28
36

1

Love it, works for me, tested it in the Chrome console and indeed the – OzzyTheGiant Oct 22 '15 at 19:12
1

I also prefer this solution. A pure JavaScript way to do it is creating a div with `var div = document.createElement('div');` and then setting `innerHTML` and getting `innerText` to unescape; vice-versa for escaping. – bozdoz Aug 06 '17 at 14:44
jQuery `text()` will strip html if it's invalid, like for ex. when using table rows. – Tim Vermaelen Aug 24 '22 at 00:12

score 13 · Answer 9 · answered Mar 04 '15 at 05:42

To add yet another "inspired by Robert K" to the list, here is another safe version which does not strip HTML tags. Instead of running the whole string through the HTML parser, it pulls out only the entities and converts those.

var decodeEntities = (function() {
    // this prevents any overhead from creating the object each time
    var element = document.createElement('div');

    // regular expression matching HTML entities
    var entity = /&(?:#x[a-f0-9]+|#[0-9]+|[a-z0-9]+);?/ig;

    return function decodeHTMLEntities(str) {
        // find and replace all the html entities
        str = str.replace(entity, function(m) {
            element.innerHTML = m;
            return element.textContent;
        });

        // reset the value
        element.textContent = '';

        return str;
    }
})();

VyvIT · Answer 10 · 2015-06-18T13:22:27.770

Inspired by Robert K's solution, strips html tags and prevents executing scripts and eventhandlers like: <img src=fake onerror="prompt(1)"> Tested on latest Chrome, FF, IE (should work from IE9, but haven't tested).

var decodeEntities = (function () {
        //create a new html document (doesn't execute script tags in child elements)
        var doc = document.implementation.createHTMLDocument("");
        var element = doc.createElement('div');

        function getText(str) {
            element.innerHTML = str;
            str = element.textContent;
            element.textContent = '';
            return str;
        }

        function decodeHTMLEntities(str) {
            if (str && typeof str === 'string') {
                var x = getText(str);
                while (str !== x) {
                    str = x;
                    x = getText(x);
                }
                return x;
            }
        }
        return decodeHTMLEntities;
    })();

Simply call:

decodeEntities('<img src=fake onerror="prompt(1)">');
decodeEntities("<script>alert('aaa!')</script>");

score 11 · Answer 11 · answered Jun 24 '16 at 01:57

Here is a full version

function htmldecode(s){
    window.HTML_ESC_MAP = {
    "nbsp":" ","iexcl":"¡","cent":"¢","pound":"£","curren":"¤","yen":"¥","brvbar":"¦","sect":"§","uml":"¨","copy":"©","ordf":"ª","laquo":"«","not":"¬","reg":"®","macr":"¯","deg":"°","plusmn":"±","sup2":"²","sup3":"³","acute":"´","micro":"µ","para":"¶","middot":"·","cedil":"¸","sup1":"¹","ordm":"º","raquo":"»","frac14":"¼","frac12":"½","frac34":"¾","iquest":"¿","Agrave":"À","Aacute":"Á","Acirc":"Â","Atilde":"Ã","Auml":"Ä","Aring":"Å","AElig":"Æ","Ccedil":"Ç","Egrave":"È","Eacute":"É","Ecirc":"Ê","Euml":"Ë","Igrave":"Ì","Iacute":"Í","Icirc":"Î","Iuml":"Ï","ETH":"Ð","Ntilde":"Ñ","Ograve":"Ò","Oacute":"Ó","Ocirc":"Ô","Otilde":"Õ","Ouml":"Ö","times":"×","Oslash":"Ø","Ugrave":"Ù","Uacute":"Ú","Ucirc":"Û","Uuml":"Ü","Yacute":"Ý","THORN":"Þ","szlig":"ß","agrave":"à","aacute":"á","acirc":"â","atilde":"ã","auml":"ä","aring":"å","aelig":"æ","ccedil":"ç","egrave":"è","eacute":"é","ecirc":"ê","euml":"ë","igrave":"ì","iacute":"í","icirc":"î","iuml":"ï","eth":"ð","ntilde":"ñ","ograve":"ò","oacute":"ó","ocirc":"ô","otilde":"õ","ouml":"ö","divide":"÷","oslash":"ø","ugrave":"ù","uacute":"ú","ucirc":"û","uuml":"ü","yacute":"ý","thorn":"þ","yuml":"ÿ","fnof":"ƒ","Alpha":"Α","Beta":"Β","Gamma":"Γ","Delta":"Δ","Epsilon":"Ε","Zeta":"Ζ","Eta":"Η","Theta":"Θ","Iota":"Ι","Kappa":"Κ","Lambda":"Λ","Mu":"Μ","Nu":"Ν","Xi":"Ξ","Omicron":"Ο","Pi":"Π","Rho":"Ρ","Sigma":"Σ","Tau":"Τ","Upsilon":"Υ","Phi":"Φ","Chi":"Χ","Psi":"Ψ","Omega":"Ω","alpha":"α","beta":"β","gamma":"γ","delta":"δ","epsilon":"ε","zeta":"ζ","eta":"η","theta":"θ","iota":"ι","kappa":"κ","lambda":"λ","mu":"μ","nu":"ν","xi":"ξ","omicron":"ο","pi":"π","rho":"ρ","sigmaf":"ς","sigma":"σ","tau":"τ","upsilon":"υ","phi":"φ","chi":"χ","psi":"ψ","omega":"ω","thetasym":"ϑ","upsih":"ϒ","piv":"ϖ","bull":"•","hellip":"…","prime":"′","Prime":"″","oline":"‾","frasl":"⁄","weierp":"℘","image":"ℑ","real":"ℜ","trade":"™","alefsym":"ℵ","larr":"←","uarr":"↑","rarr":"→","darr":"↓","harr":"↔","crarr":"↵","lArr":"⇐","uArr":"⇑","rArr":"⇒","dArr":"⇓","hArr":"⇔","forall":"∀","part":"∂","exist":"∃","empty":"∅","nabla":"∇","isin":"∈","notin":"∉","ni":"∋","prod":"∏","sum":"∑","minus":"−","lowast":"∗","radic":"√","prop":"∝","infin":"∞","ang":"∠","and":"∧","or":"∨","cap":"∩","cup":"∪","int":"∫","there4":"∴","sim":"∼","cong":"≅","asymp":"≈","ne":"≠","equiv":"≡","le":"≤","ge":"≥","sub":"⊂","sup":"⊃","nsub":"⊄","sube":"⊆","supe":"⊇","oplus":"⊕","otimes":"⊗","perp":"⊥","sdot":"⋅","lceil":"⌈","rceil":"⌉","lfloor":"⌊","rfloor":"⌋","lang":"〈","rang":"〉","loz":"◊","spades":"♠","clubs":"♣","hearts":"♥","diams":"♦","\"":"quot","amp":"&","lt":"<","gt":">","OElig":"Œ","oelig":"œ","Scaron":"Š","scaron":"š","Yuml":"Ÿ","circ":"ˆ","tilde":"˜","ndash":"–","mdash":"—","lsquo":"‘","rsquo":"’","sbquo":"‚","ldquo":"“","rdquo":"”","bdquo":"„","dagger":"†","Dagger":"‡","permil":"‰","lsaquo":"‹","rsaquo":"›","euro":"€"};
    if(!window.HTML_ESC_MAP_EXP)
        window.HTML_ESC_MAP_EXP = new RegExp("&("+Object.keys(HTML_ESC_MAP).join("|")+");","g");
    return s?s.replace(window.HTML_ESC_MAP_EXP,function(x){
        return HTML_ESC_MAP[x.substring(1,x.length-1)]||x;
    }):s;
}

Usage

htmldecode("&sum;&nbsp;&gt;&euro;");

be careful with nbsp char, I had to manually replace it because this example use a normal space. — Guillaume Malartre, Mar 23 '18 at 17:19

score 11 · Answer 12 · edited May 23 '23 at 09:58

11

A more functional approach to @William Lahti's answer:

var entities = {
    'amp': '&',
    'apos': '\'',
    '#x27': '\'',
    '#x2F': '/',
    '#39': '\'',
    '#47': '/',
    'lt': '<',
    'gt': '>',
    'nbsp': ' ',
    'quot': '"'
};

function decodeHTMLEntities(text) {
    return text.replace(/&([^;]+);/gm, function (match, entity) {
        return entities[entity] || match
    })
}

console.log(decodeHTMLEntities('Large &lt; &#163; 500'));

edited May 23 '23 at 09:58

AndyDaSilva52

115
3
19

answered Jan 22 '17 at 14:01

omerts

8,485
2
32
39

2

this doesn't address the problem of the decodeHTMLEntities('ä') or ä :) – Alejandro Vales Oct 03 '17 at 12:59
1

The list is surely not complete, it is just a rewrite of the accepted answer. You can add whatever you want to the entities list, just add '#228': 'ä'. – omerts Oct 17 '17 at 09:52
1

I think doing that for the ****** thousand special characters tath could be there could mean the death :( – Alejandro Vales Oct 17 '17 at 12:31
If you need to support all chars, you are absolutely right. As I said, this is a rewrite of the accepted answer. – omerts Oct 18 '17 at 14:14
1

And incidentally, this is exactly what people like me need. I required a short list I could manage to put in a gatsby utility where document is unavailable. Wholly bulletproof isn't always the goal. – Kai Qing Aug 28 '19 at 16:24

score 10 · Answer 13 · edited May 23 '17 at 12:34

10

Injecting untrusted HTML into the page is dangerous as explained in How to decode HTML entities using jQuery?.

One alternative is to use a JavaScript-only implementation of PHP's html_entity_decode (from http://phpjs.org/functions/html_entity_decode:424). The example would then be something like:

var varTitle = html_entity_decode("Chris&apos; corner");

edited May 23 '17 at 12:34

Community

1
1

answered Jan 04 '12 at 14:09

Diogo Kollross

368
7
11

2

Actually, the current version of html_entity_decode doesn't handle '. – studgeek Mar 21 '12 at 19:23

score 2 · Answer 14 · answered May 07 '12 at 13:30

I know I'm a bit late to the game, but I thought I might provide the following snippet as an example of how I decode HTML entities using jQuery:

var varTitleE = "Chris&apos; corner";
var varTitleD = $("<div/>").html(varTitleE).text();

console.log(varTitleE + " vs. " + varTitleD);

Don't forget to fire-up your inspector/firebug to see the console results -- or simply replace console.log(...) w/alert(...)

That said, here's what my console via the Google Chrome inspector read:

Chris&apos; corner vs. Chris' corner

Philip Kahn · Answer 15 · 2016-03-04T21:06:58.597

Because @Robert K and @mattcasey both have good code, I thought I'd contribute here with a CoffeeScript version, in case anyone in the future could use it:

    String::unescape = (strict = false) ->
      ###
      # Take escaped text, and return the unescaped version
      #
      # @param string str | String to be used
      # @param bool strict | Stict mode will remove all HTML
      #
      # Test it here:
      # https://jsfiddle.net/tigerhawkvok/t9pn1dn5/
      #
      # Code: https://gist.github.com/tigerhawkvok/285b8631ed6ebef4446d
      ###
      # Create a dummy element
      element = document.createElement("div")
      decodeHTMLEntities = (str) ->
        if str? and typeof str is "string"
          unless strict is true
            # escape HTML tags
            str = escape(str).replace(/%26/g,'&').replace(/%23/g,'#').replace(/%3B/g,';')
          else
            str = str.replace(/<script[^>]*>([\S\s]*?)<\/script>/gmi, '')
            str = str.replace(/<\/?\w(?:[^"'>]|"[^"]*"|'[^']*')*>/gmi, '')
          element.innerHTML = str
          if element.innerText
            # Do we support innerText?
            str = element.innerText
            element.innerText = ""
          else
            # Firefox
            str = element.textContent
            element.textContent = ""
        unescape(str)
      # Remove encoded or double-encoded tags
      fixHtmlEncodings = (string) ->
        string = string.replace(/\&amp;#/mg, '&#') # The rest, for double-encodings
        string = string.replace(/\&quot;/mg, '"')
        string = string.replace(/\&quote;/mg, '"')
        string = string.replace(/\&#95;/mg, '_')
        string = string.replace(/\&#39;/mg, "'")
        string = string.replace(/\&#34;/mg, '"')
        string = string.replace(/\&#62;/mg, '>')
        string = string.replace(/\&#60;/mg, '<')
        string
      # Run it
      tmp = fixHtmlEncodings(this)
      decodeHTMLEntities(tmp)

See https://jsfiddle.net/tigerhawkvok/t9pn1dn5/7/ or https://gist.github.com/tigerhawkvok/285b8631ed6ebef4446d (includes compiled JS, and is probably updated compared to this answer)

score 0 · Answer 16 · answered May 14 '14 at 08:36

To do it in pure javascript without jquery or predefining everything you can cycle the encoded html string through an elements innerHTML and innerText(/textContent) properties for every decode step that is required:

<html>
  <head>
    <title>For every decode step, cycle through innerHTML and innerText </title>
    <script>
function decode(str) {
  var d = document.createElement("div");
  d.innerHTML = str; 
  return typeof d.innerText !== 'undefined' ? d.innerText : d.textContent;
}
    </script>
  </head>
  <body>
    <script>
var encodedString = "&lt;p&gt;name&lt;/p&gt;&lt;p&gt;&lt;span style=\"font-size:xx-small;\"&gt;ajde&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;em&gt;da&lt;/em&gt;&lt;/p&gt;";
    </script>
    <input type=button onclick="document.body.innerHTML=decode(encodedString)"/>
  </body>
</html>

score -3 · Answer 17 · answered Aug 27 '13 at 14:27

-3

I think that is the exact opposite of the solution chosen.

var decoded = $("<div/>").text(encodedStr).html();

Try it :)

answered Aug 27 '13 at 14:27

Pedro

77
8

This method is not safe. You can include JavaScript in `encodedStr` which will be run. Use Robert K's method. – Gavin Feb 19 '14 at 18:29

HTML Entity Decode

17 Answers17

Linked

Related