21

I have an array containing strings with special unicode characters:

var a = [
    ["a", 33],  
    ["h\u016B", 44],
    ["s\u00EF", 51],
    ...
];

When I loop over this array:

for (i=0;i<a.length;i++) {
    document.write(a[i][0] + "<br />");
}

It prints characters with accents:

a
hù
sô
...

and I want:

a
h\u016B
s\u00EF
...

How can I achieve this in Javascript?

hippietrail
  • 15,848
  • 18
  • 99
  • 158
Jérôme Verstrynge
  • 57,710
  • 92
  • 283
  • 453

4 Answers4

22

Something like this?

/* Creates a uppercase hex number with at least length digits from a given number */
function fixedHex(number, length){
    var str = number.toString(16).toUpperCase();
    while(str.length < length)
        str = "0" + str;
    return str;
}

/* Creates a unicode literal based on the string */    
function unicodeLiteral(str){
    var i;
    var result = "";
    for( i = 0; i < str.length; ++i){
        /* You should probably replace this by an isASCII test */
        if(str.charCodeAt(i) > 126 || str.charCodeAt(i) < 32)
            result += "\\u" + fixedHex(str.charCodeAt(i),4);
        else
            result += str[i];
    }

    return result;
}

var a = [
    ["a", 33],  
    ["h\u016B", 44],
    ["s\u00EF", 51]
];

var i;
for (i=0;i<a.length;i++) {
    document.write(unicodeLiteral(a[i][0]) + "<br />");
}

Result

a
h\u016B
s\u00EF

JSFiddle

Zeta
  • 103,620
  • 13
  • 194
  • 236
  • Good solution but I think it should be `if(str.charCodeAt(i) > 127)` (ASCII stops at 0x7F). – dda Jun 08 '12 at 05:10
  • @dda: Doh, indeed. However `0x7F` is DEL, so `0x7E` should be a better upper bound. Edited my answer, thanks for the remark :). – Zeta Jun 08 '12 at 06:07
  • This will not display unicode like \u0050 (which is a valid ascii character) . How to handle that ? – gaurav5430 Feb 18 '16 at 06:55
  • @gaurav5430 Which wasn't intended in the original question. Remove the `if`. Note that `'\u0050'` and `"P"` have the same representation; you cannot check whether `"P"` was originally `'\u0050'`. – Zeta Feb 18 '16 at 07:03
  • @Zeta if i remove the if, it will convert everything to unicode – gaurav5430 Feb 18 '16 at 07:05
  • @gaurav5430: Yup. Again, there is ___no___ difference between `"\u0050"` and `"P"` after the browser has parsed your code (or any other string for that matter). You either display ASCII as is, or you display ASCII as unicode. There's no inbetween. – Zeta Feb 18 '16 at 07:10
  • yeah... actually my requirement depends on the position and offset of a string token in the actual string and the displayed string, which gets messed up because of this – gaurav5430 Feb 18 '16 at 07:11
7

javascript's string.charCodeAt() should help. I.e.

"test".charCodeAt(0) will return the numeric code for "t".

Beyond that, you'd need to write an if statement to check if the character is non-ASCII, etc.

user1417475
  • 236
  • 1
  • 8
7

if you have a unicode char and you want it as a string you can do this

x = "h\u016B";
// here the unicode is the second char
uniChar = x.charCodeAt(1).toString(16); // 16b
uniChar = uniChar.toUpperCase(); // it is now 16B
uniChar = "\\u0" + uniChar; // it is now \\u016B
x = x.charAt(0) + uniChar; // x = "h\\u016B" which prints as you wish
zeacuss
  • 2,563
  • 2
  • 28
  • 32
7

So, gotten here tried to answer this question: Javascript: display unicode as it is but it has been closed because of this question here.

Just another answer for this problem: It is also possible (at least in some modern browsers) to use the String.raw - function

Syntax is like this:

var rawStr = String.raw`Hello \u0153`;

Here is a working fiddle (Chrome, FF): http://jsfiddle.net/w9L6qgt6/1/

Dominik
  • 2,801
  • 2
  • 33
  • 45