3

The JavaScript method String.fromCharCode() behaves equivalently to Python's unichar() in the following sense:

print unichr(213) # prints Õ on the console 
console.log(String.fromCharCode(213)); // prints Õ on the console as well

For my purposes, however, I need a JavaScript equivalent to the Python function chr(). Is there such a JavaScript function or a way to make String.fromCharCode() behave like chr()?

That is, I need something in JavaScript that mimics

print chr(213) # prints � on the console
chessweb
  • 4,613
  • 5
  • 27
  • 32
  • 3
    You should add more explanation of what you are trying to do and how `String.fromCharCode` is not doing it for you. If you merely want to show the unicode replacement character in console, you can do `String.fromCharCode(0xFFFD)` – Esailija Jul 29 '12 at 10:31
  • (As a side note, you shouldn't be doing `print unichr(stuff)`.) – Julian Jul 29 '12 at 16:24

2 Answers2

3

So turns out you just want to work with raw bytes in node.js, there's a module for that. If you are a real wizard, you can get this stuff to work with javascript strings alone but it's harder and far less efficient.

var b = new Buffer(1);
b[0] = 213;

console.log(b.toString()); //�


var b = new Buffer(3);
b[0] = 0xE2;
b[1] = 0x98;
b[2] = 0x85;

console.log(b.toString()); //★

print chr(213) # prints � on the console

So this prints a raw byte (0xD5), that is interpreted in UTF-8 (most likely) which is not valid UTF-8 byte sequence and thus is displayed as the replacement character (�).

The interpretation as UTF-8 is not relevant here, you most likely just want raw bytes.

To create raw bytes in javascript you could use UInt8Array.

var a = new Uint8Array(1);
a[0] = 213;

You could optionally then interpret the raw bytes as utf-8:

console.log( utf8decode(a)); // "�"

//Not recommended for production use ;D
//Doesn't handle > BMP to keep the answer shorter
function utf8decode(uint8array) {
    var codePoints = [],
        i = 0,
        byte, codePoint, len = uint8array.length;
    for (i = 0; i < len; ++i) {
        byte = uint8array[i];

        if ((byte & 0xF8) === 0xF0 && len > i + 3) {

            codePoint = ((byte & 0x7) << 18) | ((uint8array[++i] & 0x3F) << 12) | ((uint8array[++i] & 0x3F) << 6) | (uint8array[++i] & 0x3F);
            if (!(0xFFFF < codePoint && codePoint <= 0x10FFFF)) {
                codePoints.push(0xFFFD, 0xFFFD, 0xFFFD, 0xFFFD);
            } else {
                codePoints.push(codePoint);
            }
        } else if ((byte & 0xF0) === 0xE0 && len > i + 2) {

            codePoint = ((byte & 0xF) << 12) | ((uint8array[++i] & 0x3F) << 6) | (uint8array[++i] & 0x3F);
            if (!(0x7FF < codePoint && codePoint <= 0xFFFF)) {
                codePoints.push(0xFFFD, 0xFFFD, 0xFFFD);
            } else {
                codePoints.push(codePoint);
            }
        } else if ((byte & 0xE0) === 0xC0  && len > i + 1) {

            codePoint = ((byte & 0x1F) << 6) | ((uint8array[++i] & 0x3F));
            if (!(0x7F < codePoint && codePoint <= 0x7FF)) {
                codePoints.push(0xFFFD, 0xFFFD);
            } else {
                codePoints.push(codePoint);
            }
        } else if ((byte & 0x80) === 0x00) {
            codePoints.push(byte & 0x7F);
        } else {
            codePoints.push(0xFFFD);
        }
    }
    return String.fromCharCode.apply(String, codePoints);
}

What you are most likely trying to do has nothing to do with trying to interpret the bytes as utf8 though.

Another example:

//UTF-8 For the black star U+2605 ★:
var a = new Uint8Array(3);
a[0] = 0xE2;
a[1] = 0x98;
a[2] = 0x85;
utf8decode(a) === String.fromCharCode(0x2605) //True
utf8decode(a) // ★

In python 2.7 (Ubuntu):

print chr(0xE2) + chr(0x98) + chr(0x85)
#prints ★
Esailija
  • 138,174
  • 23
  • 272
  • 326
  • Before I dive into your code, could you please tell me whether it still applys in light of the clarification I gave in my comment to Niko's answer? – chessweb Jul 29 '12 at 11:09
  • @chessweb Well you didn't say what you want to do. The only thing I could get from your question is that you want to write raw bytes and interpret them as UTF-8. So that's what my answer does. – Esailija Jul 29 '12 at 11:11
  • Well, here goes. I'm developing a Node.js/HTML5 interface to the Free International Chess Server. In that context I'm currently porting a tested and working encryption algorithm written in Python that is used for timesealing moves when playing chess on that server. The algorithm in Python makes use of the chr function. I replaced that by String.fromCharCode in my port. Didn't work. Then I replaced chr by unichr in the Python code and noticed that it produced the same result as my JavaScript port with String.fromCharCode. So it all boils down to find a JavaScript counterpart to chr. – chessweb Jul 29 '12 at 11:26
  • @chessweb You want to work with raw bytes, you shouldn't use `String.fromCharCode` at all. Especially since you are in node.js, you can use http://nodejs.org/api/buffer.html. Javascript doesn't have a counterpart to `chr` since it internally works with utf16 strings only. – Esailija Jul 29 '12 at 11:31
  • The node-buffer stuff does look promising. I'll give it a try. Thanks. – chessweb Jul 29 '12 at 11:41
  • @chessweb I have now edited node.js examples to top of the question – Esailija Jul 29 '12 at 11:57
  • @Esailija Typed arrays are also supported in IE10, Opera 11.60 and Safari 5.1. You can omit the "Not really supported" part. – Rob W Jul 29 '12 at 12:04
  • @RobW done :) though I wouldn't have minded if you had edited it out :P – Esailija Jul 29 '12 at 12:06
  • @Esailija This is just to let you know that you made my week with your link to the Node.js Buffer class. It works like a charm. Thanks again. – chessweb Jul 29 '12 at 12:28
1

If you want this "Questionmark in a box" for every number that is not in the standard ASCII table, how about this little function?

function chr(c) {
    return (c < 0 || c > 126) ? '�' : String.fromCharCode(c);
}
Niko
  • 26,516
  • 9
  • 93
  • 110
  • No, that is not what I mean. From the console output of chr(213) and unichr(213) it is clear that those Python functions behave differently in the range from 128 to 255. Now, String.fromCharCode(213) emits the same output as unichr(213). So String.fromCharCode is a JavaScript counterpart to unichr. What I need is a JavaScript counterpart to chr in the sense that it emits the same output as chr for inputs from 128 to 255. – chessweb Jul 29 '12 at 11:02