10

Is there any function to do the following?

var specialStr = 'ipsum áá éé lore';
var encodedStr = someFunction(specialStr);
// then encodedStr should be like 'ipsum \u00E1\u00E1 \u00E9\u00E9 lore'

I need to encode the characters that are out of ASCII range, and need to do it with that encoding. I don't know its name. Is it Unicode maybe?

Smi
  • 13,850
  • 9
  • 56
  • 64
Hanoi
  • 101
  • 1
  • 3
  • @mplungjan this has nothing to do with URI encoding; neither of the linked questions do what the OP wants. – Domenic Sep 21 '11 at 12:16
  • See http://www.javascripter.net/faq/escape.htm or, even better, see https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Core_Language_Features#Unicode. – yoozer8 Sep 21 '11 at 12:17
  • Or here [Convert special characters to HTML in Javascript](http://stackoverflow.com/questions/784586/convert-special-characters-to-html-in-javascript) – mplungjan Sep 21 '11 at 12:17
  • 2
    @mplungjan you yet again seem to have failed to read the OP's question. – Domenic Sep 21 '11 at 12:20
  • @Domenic - granted, I deleted the first links but the last link is more relevant (not the accepted answer but some of the other answers), I object to "Yet again" – mplungjan Sep 21 '11 at 12:21

4 Answers4

18

This should do the trick:

function padWithLeadingZeros(string) {
    return new Array(5 - string.length).join("0") + string;
}

function unicodeCharEscape(charCode) {
    return "\\u" + padWithLeadingZeros(charCode.toString(16));
}

function unicodeEscape(string) {
    return string.split("")
                 .map(function (char) {
                     var charCode = char.charCodeAt(0);
                     return charCode > 127 ? unicodeCharEscape(charCode) : char;
                 })
                 .join("");
}

For example:

var specialStr = 'ipsum áá éé lore';
var encodedStr = unicodeEscape(specialStr);

assert.equal("ipsum \\u00e1\\u00e1 \\u00e9\\u00e9 lore", encodedStr);
Domenic
  • 110,262
  • 41
  • 219
  • 271
3

If you need hex encoding rather than unicode then you can simplify @Domenic's answer to:

"aäßåfu".replace(/./g, function(c){return c.charCodeAt(0)<128?c:"\\x"+c.charCodeAt(0).toString(16)})

returns: "a\xe4\xdf\xe5fu"
Max Murphy
  • 1,701
  • 1
  • 19
  • 29
  • Do you know that the charcode can be larger than 255? `"ė".replace(/./g, function(c){return c.charCodeAt(0)<128?c:"\\x"+c.charCodeAt(0).toString(16)})` returns `\x117` and that will lead to trouble. – some Feb 23 '18 at 19:28
1

This works for me. Specifically when using the Dropbox REST API:

   encodeNonAsciiCharacters(value: string) {
        let out = ""
        for (let i = 0; i < value.length; i++) {
            const ch = value.charAt(i);
            let chn = ch.charCodeAt(0);
            if (chn <= 127) out += ch;
            else {
                let hex = chn.toString(16);
                if (hex.length < 4)
                    hex = "000".substring(hex.length - 1) + hex;
                out += "\\u" + hex;
            }
        }
        return out;
    }
Jens
  • 1,599
  • 14
  • 33
1

Just for information you can do as Domenic said or use the escape function but that will generate unicode with a different format (more browser friendly):

>>> escape("áéíóú");
"%E1%E9%ED%F3%FA"
fmsf
  • 36,317
  • 49
  • 147
  • 195
  • 1
    Interestingly enough: `escape("☃") === "%u2603"` while `escape("á") === "%E1"`. I wonder how they decide when to switch formats and add a `"u"` at the beginning... – Domenic Sep 21 '11 at 12:49
  • 2
    Ah, well, MDN says "The escape and unescape functions do not work properly for non-ASCII characters and have been deprecated.": https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Functions#escape_and_unescape_Functions so maybe that's the source of the inconsistency. – Domenic Sep 21 '11 at 12:49