textContent
returns everything correctly, as ​
is the Unicode Character 'ZERO WIDTH SPACE' (U+200B), which is:
commonly abbreviated ZWSP
this character is intended for invisible word separation and for line break control; it has no width, but its presence between two characters does not prevent increased letter spacing in justification
It can be easily proven with:
var div = document.createElement('div');
div.innerHTML = '​xXx';
console.log( div.textContent ); // "xXx"
console.log( div.textContent.length ); // 4
console.log( div.textContent[0].charCodeAt(0) ); // 8203
As Eugen Timm mentioned in his answer it is a bit tricky to convert UTF characters back to HTML entities, and his solution is completely valid for non standard characters with char code higher than 1000
. As an alternative I may propose a shorter RegExp solution which will give the same result:
var result = div.textContent.replace(/./g, function(x) {
var code = x.charCodeAt(0);
return code > 1e3 ? '&#' + code + ';' : x;
});
console.log( result ); // "​xXx"
For a better solution you may have a look at this answer which can handle all HTML special characters.