1

Say you have a byte string: "0100010101110001010...". How to convert it into a UTF16 string (eg: "A|b☮"), and how to convert it back to the original byte string?

I have attempted the implementation below, but it seems like my understanding in UTF16 is not good enough and the code breaks in some (I don't know which) cases.

var pad = function(x){
    while(x.length%16!==0)
        x="0"+x;
    return x;
}
var unpack_bin = function(a){
    for(var r="",i=0,l=a.length;i<l;++i)
        r+=pad((a[i].charCodeAt(0)-36).toString(2));
    return r.slice(r.indexOf("1")+1);
}
var pack_bin = function(a) {
    for (var s="",i=0,l=a.length,a=pad("1"+a);i<l;i+=16) 
        s+=String.fromCharCode(parseInt(a.slice(i,i+16),2)+36);
    return s;
}
MaiaVictor
  • 51,090
  • 44
  • 144
  • 286
  • i don't think you can; certain unicode "charcodes" don't like sitting next to eachother, so what comes out might differ from what goes in. – dandavis Feb 20 '14 at 03:40

1 Answers1

1

You can't fit 16 bits into one UTF-16 unit, but you can fit 14 bits.

CJK Unified Ideographs is a continuous block of 20941 valid unicode characters, each encoded by a single UTF-16 unit.

function pad(x, div){
    while(x.length%div!==0)
        x="0"+x;
    return x;
}

function unpack_bin(packedString){
    var binString = "";
    for(var i=0; i<packedString.length; ++i) {
        var binValue = packedString[i].charCodeAt(0)-0x4E00;
        binString += pad(binValue.toString(2), 14);
    }
    return binString.slice(binString.indexOf("1")+1);
}

function pack_bin(binString) {
    binString = pad("1"+binString, 14);
    var packedString = "";
    for(var i=0; i<binString.length; i+=14) {
        var charCode = parseInt(binString.slice(i, i+14), 2)+0x4E00;
        packedString += String.fromCharCode(charCode);
    }
    return packedString;
}

See also: Twitter image encoding challenge

Community
  • 1
  • 1
Anton
  • 3,113
  • 14
  • 12
  • That sounds great but this isn't valid JavaScript. Did you just sketch that? I guess [this](http://lpaste.net/100178) is what you meant. It would be really cool if you fixed it and included a method for unpacking. – MaiaVictor Feb 20 '14 at 16:25
  • Oh, nevermind, I just understood what you've done. Here is a working code for someone needing it: http://lpaste.net/100183 – MaiaVictor Feb 20 '14 at 16:49
  • @Viclib You're right, the code wasn't valid; I should have tested it before posting. I fixed the answer. – Anton Feb 20 '14 at 17:10