Can hexadecimal html be systematically converted to unicode via Javascript?

Question

I have the following string for example:

"Hi I am testing a weird character Ů, its a U with a circle"

Now my string uses the html code Ů to display the U-circle. I need this however to be in unicode format, ie. \u016E. Is there any good systematic way to do this with plain vanilla javascript?

See http://stackoverflow.com/questions/2808368/converting-html-entities-to-unicode-character-in-javascript — Stefano Sanfilippo, May 06 '13 at 14:15
What is "Unicode format"? You mean `U+016E` or its Javascript equivalent, `\u016E`? Or just the encoding the HTML file uses (i.e. the character itself)? By the way, Ů is not hexadecimal. — Mr Lister, May 06 '13 at 14:18
The problem with the answers to the question linked above is that unless you're in a browser, none of them addresses decoding numeric entities. — T.J. Crowder, May 06 '13 at 14:21

nwellnhof · Accepted Answer · 2020-08-18T01:50:39.740

If you want to convert numeric HTML character references to Unicode escape sequences, try the following (doesn't work with with code points above 0xFFFF):

function convertCharRefs(string) {
    return string
        .replace(/&#(\d+);/g, function(match, num) {
            var hex = parseInt(num).toString(16);
            while (hex.length < 4) hex = '0' + hex;
            return "\\u" + hex;
        })
        .replace(/&#x([A-Za-z0-9]+);/g, function(match, hex) {
            while (hex.length < 4) hex = '0' + hex;
            return "\\u" + hex;
        });
}

If you simply want to decode the character references:

function decodeCharRefs(string) {
    return string
        .replace(/&#(\d+);/g, function(match, num) {
            return String.fromCodePoint(num);
        })
        .replace(/&#x([A-Za-z0-9]+);/g, function(match, num) {
            return String.fromCodePoint(parseInt(num, 16));
        });
}

Both functions use String.replace with a function as replacement.

Shouldn't `[A-Fa-f0-9]` be sufficient? – pishpish Apr 09 '17 at 16:27 — pishpish, Apr 09 '17 at 16:27
@changed Yes, feel free to edit my answer. – nwellnhof Apr 09 '17 at 17:23 — nwellnhof, Apr 09 '17 at 17:23

Can hexadecimal html be systematically converted to unicode via Javascript?

1 Answers1