Javascript, convert unicode string to Javascript escape?

Question

I have a variable that contains a string consisting of Japanese characters, for instance;

"みどりいろ"

How would I go about converting this to its Javascript escape form?

The result I am after for this example specifically is:

"\u306f\u3044\u3044\u308d"

I'd prefer a jquery approach if there's a variation.

@SergeiZahharenko - `escape("abc") //"abc"`... – Derek 朕會功夫 Jan 09 '14 at 08:06 — Derek 朕會功夫, Jan 09 '14 at 08:06

Derek 朕會功夫 · Accepted Answer · 2017-03-23T03:08:00.470

41

"み".charCodeAt(0).toString(16);

This will give you the unicode (in Hex). You can run it through a loop:

String.prototype.toUnicode = function(){
    var result = "";
    for(var i = 0; i < this.length; i++){
        // Assumption: all characters are < 0xffff
        result += "\\u" + ("000" + this[i].charCodeAt(0).toString(16)).substr(-4);
    }
    return result;
};

"みどりいろ".toUnicode();       //"\u307f\u3069\u308a\u3044\u308d"
"Mi Do Ri I Ro".toUnicode();  //"\u004d\u0069\u0020\u0044\u006f\u0020\u0052\u0069\u0020\u0049\u0020\u0052\u006f"
"Green".toUniCode();          //"\u0047\u0072\u0065\u0065\u006e"

Demo: http://jsfiddle.net/DerekL/X7MCy/

More on: .charCodeAt

edited Mar 23 '17 at 03:08

answered Jan 09 '14 at 07:57

Derek 朕會功夫

92,235
44
185
247

My bad :) For some reason I missed the `.toString(16)` part – Elad Stern Jan 09 '14 at 07:59
@EladStern - It's okay. – Derek 朕會功夫 Jan 09 '14 at 08:12
You can replace `while(partial.length !== 4) partial = "0" + partial;` with `('0000' + partial).substr(-4);` which I would prefer :) – Adassko Jan 09 '14 at 08:16
@Adassko - Ooo nice idea. – Derek 朕會功夫 Jan 09 '14 at 08:17
1

You can also replace your loop with a `replace` function. Then the whole function will be: `return this.replace(/./g, function(c) { return "\\u" + ('000' + c.charCodeAt(0).toString(16)).substr(-4) });` :P – Adassko Jan 09 '14 at 08:33
@Adassko - I didn't really thought of it, but I think creating a new anonymous function for each character would be slower and consume more memory than a `for` loop. [(It would be very minimal though, about 5% slower)](http://jsperf.com/looping-through-a-string) – Derek 朕會功夫 Jan 09 '14 at 08:45
@Derek朕會功夫: it creates new function only once, then just calls it for every character. No doubt it will be a little bit slower (especially that it uses regular expression) but that's should bother anyone – Adassko Jan 09 '14 at 09:56
Interesting thing is that when you change `this[i].charCodeAt(0)` to `this.charCodeAt(i)` it will be even slower than the version with `replace` :O – Adassko Jan 09 '14 at 10:05
looks like using `map` is the fastest if you look on performance thought: http://jsperf.com/looping-through-a-string/2 – Adassko Jan 09 '14 at 10:15
@Adassko That's interesting. Maybe .charCodeAt has a different character selecting algorithm or maybe the whole string has to be copied into the method before processing, making it slower. – Derek 朕會功夫 Jan 09 '14 at 19:16
@Derek朕會功夫 i have Hex Unicode, How can convert Hex Unicode to normal text?? – Santosh Jadi May 30 '16 at 07:24
@SantoshJadi How is it represented? – Derek 朕會功夫 May 30 '16 at 14:30
You should never ever extend `prototype` of built-in classes. Suppose the Ecma TC39 would want to add `toUnicode` function. Well if many people will use this then now they cannot. – Nux May 13 '20 at 15:58
@Nux This answer was written in a time when extending the prototype was still a fairly common practice, but the landscape has since changed and you should probably export the function as a module instead. – Derek 朕會功夫 May 17 '20 at 21:33

Adam Leggett · Answer 2 · 2018-07-17T15:18:10.043

11

Above answer is reasonable. A slight space and performance optimization:

function escapeUnicode(str) {
    return str.replace(/[^\0-~]/g, function(ch) {
        return "\\u" + ("000" + ch.charCodeAt().toString(16)).slice(-4);
    });
}

edited Jul 17 '18 at 15:18

answered Nov 12 '16 at 00:01

Adam Leggett

3,714
30
24

Adassko · Answer 3 · 2014-01-10T00:34:24.270

6

just

escape("みどりいろ")

should meet the needs for most cases, buf if you need it in the form of "\u" instead of "%xx" / "%uxxxx" then you might want to use regular expressions:

~~escape("みどりいろ").replace(/%/g, '\\').toLowerCase()~~

escape("みどりいろ").replace(/%u([A-F0-9]{4})|%([A-F0-9]{2})/g, function(_, u, x) { return "\\u" + (u || '00' + x).toLowerCase() });

(toLowerCase is optional to make it look exactly like in the first post)

It doesn't escape characters it doesn't need to in most cases which may be a plus for you; if not - see Derek's answer, or use my version:

'\\u' + "みどりいろ".split('').map(function(t) { return ('000' + t.charCodeAt(0).toString(16)).substr(-4) }).join('\\u');

edited Jan 10 '14 at 00:34

answered Jan 09 '14 at 08:10

Adassko

5,201
20
37

Upvoted because this works too (only for characters other than latin letters and common punctuation marks.) – Derek 朕會功夫 Jan 09 '14 at 08:53
Fails for characters in the range U+0000 to U+001F, U+007F to U+00FF plus various punctuation marks. These characters get `escape`d to `%xx` instead of `%uxxxx`, which results in invalid backslash escapes. You would have to do two replacements, one for `%u` to `\u` and then one for `%` to `\x`. Also the `toLowerCase()` is superfluous and would lose information for unescaped characters. – bobince Jan 09 '14 at 20:33
does this pass [The Pile of Poo Test™](https://mathiasbynens.be/notes/javascript-unicode#poo-test) ? :P – törzsmókus Feb 01 '17 at 18:04

score 1 · Answer 4 · answered Apr 24 '20 at 12:03

My version of code, based on previous answers. I use if to convert non UTF8 chars in JSON.stringify().

const toUTF8 = string =>
    string.split('').map(
        ch => !ch.match(/^[^a-z0-9\s\t\r\n_|\\+()!@#$%^&*=?/~`:;'"\[\]\-]+$/i)
            ? ch
            : '\\' + 'u' + '000' + ch.charCodeAt(0).toString(16)
    ).join('');

Usage:

JSON.stringify({key: 'Категория дли импорта'}, (key, value) => {
    if (typeof value === "string") {
        return toUTF8(value);
    }

    return value;
});

Returns JSON:

{"key":"\\u00041a\\u000430\\u000442\\u000435\\u000433\\u00043e\\u000440\\u000438\\u00044f \\u000434\\u00043b\\u000438 \\u000438\\u00043c\\u00043f\\u00043e\\u000440\\u000442\\u000430"}

Those \u sequences make no sense. – SamB Nov 26 '22 at 18:35 — SamB, Nov 26 '22 at 18:35

score 0 · Answer 5 · answered Feb 23 '21 at 14:36

0

Just use the encodeURI function:

encodeURI("みどりいろ")
"%E3%81%BF%E3%81%A9%E3%82%8A%E3%81%84%E3%82%8D"

And the other side decode it back:

decodeURI("%E3%81%BF%E3%81%A9%E3%82%8A%E3%81%84%E3%82%8D")
"みどりいろ"

answered Feb 23 '21 at 14:36

Sándor Krisztián

570
6
9

Diego Raian · Answer 6 · 2020-03-26T18:35:43.083

-1

I have an answer for this question. This function I made worked for me. To encode only the non utf-8 characters to Unicode.

function toUnicode(word){
       let array = word.split("");
       array =  array.map((character)=>{
                if(character.match(/[^a-zA-Z]/g)){
                    let conversion = "000" + character.charCodeAt(0).toString(16)
                    return "\\u" + conversion;
                 }
                 return character;
});
return array.join("")
}

edited Mar 26 '20 at 18:35

answered Mar 26 '20 at 18:25

Diego Raian

29
5

1

This works for some characters but for "higher" characters like ✓ it doesn't. The code from Adam Leggett below https://stackoverflow.com/a/40558081/3434804 gets the job done. – pojda Oct 23 '20 at 09:59

Javascript, convert unicode string to Javascript escape?

6 Answers6

Linked

Related