1

How to decode HTML entities for XHTML application?

For example, $("<div/>").html("&middot;").text() will raise an JavaScript error.

Error is:

[Exception... "An invalid or illegal string was specified" code: "12" nsresult: "0x8053000c (SyntaxError)"

EDIT: XHTML means real XHTML application, with application/xhtml+xml Content-type.

Alex Ivasyuv
  • 8,585
  • 17
  • 72
  • 90

2 Answers2

1

You can either use document.createEntityReference or escape the characters unsuitable for your file encoding at JavaScript code's level (\uxxxx); but, as pointed in your comment, document.createEntityReference doesn't work on Firefox: https://developer.mozilla.org/fr/docs/DOM/document.createEntityReference

Alternatively, you can use a custom library such as php.js's html_entity_decode: http://phpjs.org/functions/html_entity_decode/

Julien Royer
  • 1,419
  • 1
  • 14
  • 27
  • document.createEntityReference doesn't work in Fx neither Chrome. Looks like deprecated yet. \x26middot; also doesn't work. – Alex Ivasyuv Nov 09 '12 at 15:49
  • @AlexIvasyuv: Unicode escape sequences in JS don't work with reference names; you have to get the corresponding code point first. – Julien Royer Nov 09 '12 at 15:55
  • Using html_entity_decode with http://phpjs.org/functions/get_html_translation_table/ works fine! – Alex Ivasyuv Nov 09 '12 at 16:02
0

Try using pure-JavaScript innerHTML property instead of jQuery's html() method:

var elem = document.createElement('div');
elem.innerHTML = '&middot;';

var text = $(elem).text();

alert(text);

If jQuery has issues with XHTML, you can try to avoid it completely. Instead of jQuery.text(), textContent property can be used (for obsolete versions of IE, innerText can be used).

In general, it makes sense to decode entities on server side. For example, in PHP, there is a standard function for this purpose: html_entity_decode().

Marat Tanalin
  • 13,927
  • 1
  • 36
  • 52