26

Say I have an element like this...

<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mo class="symbol">α</mo>
</math>

Is there a way to get the unicode/hex value of alpha α, &#x03B1, using JavaScript/jQuery? Something like...

$('.symbol').text().unicode(); // I know unicode() doesn't exist
$('.symbol').text().hex(); // I know hex() doesn't exist

I need &#x03B1 instead of α and it seems like anytime I insert &#x03B1 into the DOM and try to retrieve it right away, it gets rendered and I can't get &#x03B1 back; I just get α.

ChrisF
  • 134,786
  • 31
  • 255
  • 325
Hristo
  • 45,559
  • 65
  • 163
  • 230

4 Answers4

31

Using mostly plain JavaScript, you should be able to do:

function entityForSymbolInContainer(selector) {
    var code = $(selector).text().charCodeAt(0);
    var codeHex = code.toString(16).toUpperCase();
    while (codeHex.length < 4) {
        codeHex = "0" + codeHex;
    }

    return "&#x" + codeHex + ";";
}

Here's an example: http://jsfiddle.net/btWur/

aroth
  • 54,026
  • 20
  • 135
  • 176
22

charCodeAt will get you the decimal value of the string:

"α".charCodeAt(0); //returns 945
0x03b1 === 945; //returns true

toString will then get the hex string

(945).toString(16); // returns "3b1"

(Confirmed to work in IE9 and Chrome)

Jim Deville
  • 10,632
  • 1
  • 37
  • 47
17

If you would try to convert Unicode character out of BMP (basic multilingual plane) in ways above - you are up for a nasty surprise. Characters out of BMP are encoded as multiple UTF16 values for example:

"".length = 2 (one part for shackle one part for lock base :) )

so "".charCodeAt(0) will give you 55357 which is only 'half' of number while "".charCodeAt(1) will give you 56594 which is the other half.

To get char codes for those values you might wanna use use following string extension function

String.prototype.charCodeUTF32 = function(){   
    return ((((this.charCodeAt(0)-0xD800)*0x400) + (this.charCodeAt(1)-0xDC00) + 0x10000));
};

you can also use it like this

"&#x"+("".charCodeUTF32()).toString(16)+";"

to get html hex codes.

Hope this saves you some time.

Matas Vaitkevicius
  • 58,075
  • 31
  • 238
  • 265
  • 1
    +1 Thanks for saving us from this landmine! Checking the length of the character was the key for me. – L0j1k Jun 28 '16 at 23:41
  • Good insight, and note that not just emojis are beyond the BMP :) Your prototype enhancement should probably check the length first; for "UTF-8" strings the `this.charCodeAt(1)` with return `NaN`, and so will the entire function as a consequence; for "length === 2" chars it should just return `charCodeAt(0)` as such. – kontur Feb 18 '21 at 10:00
0

for example in case you need to convert this hex code to unicode

e68891e4bda0e4bb96

  1. pick two character time by time ,
  2. if the dec ascii code is over 127 , add a % before
  3. return url decode string

    function hex2a(hex) { var str = ''; for (var i = 0; i < hex.length; i += 2){

        var dec = parseInt(hex.substr(i, 2), 16);
        character = String.fromCharCode(dec);
    
    
        if (dec > 127)
            character = "%"+hex.substr(i,2);
    
        str += character;
    
    }
    
    return decodeURI(str);
    

    }

chings228
  • 1,859
  • 24
  • 24