5

Does somebody know a script that is able to convert a string to a ArrayBuffer using unicode encoding?

I´m creating a browser-side eqivalent of the "Buffer" of node.js. The only encoding that is left is unicode. All others are done.

Thanks for your help!

Van Coding
  • 24,244
  • 24
  • 88
  • 132
  • which unicode encoding: utf-8, utf-16le, utf-16be, utf-32le, utf-32be? there are quite a few. – Dan D. Jan 25 '12 at 16:56
  • the node.js docs say it´t the unicode BMP(Basic Multilingual Plane) encoding. – Van Coding Jan 25 '12 at 16:58
  • Basic Multilingual Plane is an abstraction related to unicode, but not an encoding and is related to all encodings listed above. UTF-16LE is used in Javascript browser engines and it is that, according to your answer. – kirilloid Jan 25 '12 at 17:50
  • 1
    is your Buffer port open source? – Janus Troelsen Sep 19 '12 at 17:39
  • 1
    @Janus Troelsen I haven't published it on github, but if you wish I can do it. But there are better ones, I think. Just search for "buffer browserify" on github and you'll find very good code. One repo is also used by node-browserify. Hope it helps. – Van Coding Sep 19 '12 at 19:03
  • But I'd very much like it anyway, since SlowBuffer.prototype.copy is missing – Janus Troelsen Sep 19 '12 at 19:49
  • @VanCoding: Would it be possible? – Janus Troelsen Sep 21 '12 at 11:03
  • @Janus Troelsen my implementation does not have all the features of node's original implementation. "copy", for exmaple is also missing in my implementation. Also, I've never used it in production so I really recommend you to use something on github. They are also faster. – Van Coding Sep 21 '12 at 11:52
  • @VanCoding: Yeah I found buffer-browserify on GitHub but it was riddled with bugs ({read,write}{UInt,Int}{8,16,32}{LE,BE} wasn't working). But I think I fixed them now. But don't be so modest, even if you did a bad job it could still be a good job compared to others. Anyway, Grüezi :D – Janus Troelsen Sep 21 '12 at 11:59
  • @Janus Troelsen here you are: https://github.com/VanCoding/broffer.js.git – Van Coding Sep 21 '12 at 17:42

1 Answers1

8

I found it out by myself.

Decoding:

var b = new Uint8Array(str.length*2);
for(var i = 0; i < b.length; i+=2){
    var x = str.charCodeAt(i/2);
    var a = x%256;
    x -= a;
    x /= 256;
    b[i] = x;
    b[i+1] = a;
}

Encoding

var s = "";
for(var i = 0; i < this.length;){
    s += String.fromCharCode(this[i++]*256+this[i++]);
}
Van Coding
  • 24,244
  • 24
  • 88
  • 132
  • 4
    `s += String.fromCharCode(this[i++]*256+this[i++]);` would be slow for long strings. Gather charcodes in array `arr` and execute `String.fromCharCode.apply(arr)`. – kirilloid Jan 25 '12 at 17:47
  • 3
    Ouch, sorry. `String.fromCharCode.apply(*null*, arr)` – kirilloid Jan 25 '12 at 17:56
  • *ROFL*. I just faced the same problem, when transfering data from Java applet into Javascript. – kirilloid Jan 25 '12 at 20:04
  • 1
    some unicode characters use more than 2 bytes, so I'm not sure how you detect those etc, it's a long spec and it's been a while since I browsed it. – J Chris A Jun 27 '12 at 21:46
  • This doesn't seem to work. sha1sum the bytes of "hello world" (in your terminal) and then convert it with that method and you'll get something completely different using the Web Crypto API. It may *contain* a the string, but it doesn't *convert* it. See https://gist.github.com/coolaj86/87d834cfe6ec07d2ee81 I still haven't figured it out for multi-byte characters, but I have gotten single byte characters to match sha1sums as expected. – coolaj86 Jan 05 '15 at 03:03