1

For encoding, Javascript pulls from the standard Anscii table for mapping characters. I found the following function below that brilliantly and correctly encodes to Anscii85/Base85. But I want to encode to the Z85 variation because it contains the set of symbols that I require. My understanding is that the Anscii85/Base85 encoding should work exactly the same, except that Z85 maps the values in a different order from the Anscii standard, and uses a different combination of symbols from the standard Ansii85 mapping as well. So the character set is the only difference:

Ansci85 uses the 85 characters, 32 through 126 (reference): "!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstu

Z85 uses a custom set of 85 characters (reference): 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-:+=^!/*?&<>()[]{}@%$#

My question is, is there any way to redefine the character set that charCodeAt and fromCharCode refer to in this function so that it would then encode in Z85?

// By Steve Hanov. Released to the public domain.
function encodeAscii85(input) {
// Remove Adobe standard prefix
//  var output = "<~";
  var chr1, chr2, chr3, chr4, chr, enc1, enc2, enc3, enc4, enc5;
  var i = 0;

  while (i < input.length) {
    // Access past the end of the string is intentional.
    chr1 = input.charCodeAt(i++);
    chr2 = input.charCodeAt(i++);
    chr3 = input.charCodeAt(i++);
    chr4 = input.charCodeAt(i++);

    chr = ((chr1 << 24) | (chr2 << 16) | (chr3 << 8) | chr4) >>> 0;

    enc1 = (chr / (85 * 85 * 85 * 85) | 0) % 85 + 33;
    enc2 = (chr / (85 * 85 * 85) | 0) % 85 + 33;
    enc3 = (chr / (85 * 85) | 0 ) % 85 + 33;
    enc4 = (chr / 85 | 0) % 85 + 33;
    enc5 = chr % 85 + 33;

    output += String.fromCharCode(enc1) +
      String.fromCharCode(enc2);
    if (!isNaN(chr2)) {
      output += String.fromCharCode(enc3);
      if (!isNaN(chr3)) {
        output += String.fromCharCode(enc4);
        if (!isNaN(chr4)) {
          output += String.fromCharCode(enc5);
        }
      }
    }
  }
// Remove Adobe standard suffix
//  output += "~>";

  return output;
}

Extra notes:

Alternately, I thought I could use something like the following function, but the problem is that it doesn't properly encode Anscii85 in the first place. If it was correct, Hello world! should encode to 87cURD]j7BEbo80, but this function encodes it to RZ!iCB=*gD0D5_+ (reference).

I don't understand the algorithm enough to know what is wrong with the mapping here. Ideally, if it was encoding correctly, I should be able to update this function to use the Z85 character set:

// Adapted from: Ascii85 JavaScript implementation, 2012.10.16 Jim Herrero
// Original: https://jsfiddle.net/nderscore/bbKS4/
var Ascii85 = {
    // Ascii85 mapping
    _alphabet: "!\"#$%&'()*+,-./0123456789:;<=>?@"+
               "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`"+
               "abcdefghijklmnopqrstu"+

               "y"+ // short form 4 spaces (optional)
               "z", // short form 4 nulls (optional)

    // functions
    encode: function(input) {
        var alphabet = Ascii85._alphabet,
            useShort = alphabet.length > 85,
            output = "", buffer, val, i, j, l;

        for (i = 0, l = input.length; i < l;) {
            buffer = [0,0,0,0];
            for (j = 0; j < 4; j++)
                if(input[i])
                  buffer[j] = input.charCodeAt(i++);

            for (val = buffer[3], j = 2; j >= 0; j--)
                val = val*256+buffer[j];

            if (useShort && !val) 
                output += alphabet[86];
            else if (useShort && val == 0x20202020) 
                output += alphabet[85];
            else {
                for (j = 0; j < 5; j++) {
                    output += alphabet[val%85];
                    val = Math.floor(val/85);
                }
            }
        }

        return output;
    }
};
Jeremy Caris
  • 117
  • 1
  • 8

1 Answers1

1

Character codes are character codes. You can't change the behavior of String.fromCharCode() or String.charCodeAt().

However, you can store your custom character set in an array and use array indexing and Array.indexOf() to look up entries.

Updating this function to work with Z85 will be tricky, though, because String.fromCharCode() and String.charCodeAt() are used in two different contexts -- they're sometimes used to access the unencoded string (which doesn't need to change), and sometimes for the encoded string (which does). You will need to take care to not confuse the two.

  • Thank you. This is very helpful. Let me play around a bit based on your info. It seems like I should theoretically be able to continue to use fromCharCode as-is to pull the unencoded string, but replace charCodeAt with array indexing to map to the z85 char set... – Jeremy Caris Mar 14 '19 at 02:53
  • This works, except that my encoded results are not quite correct. I think I may only need to take zeros and spaces into account. But your answer allowed me to change the character set successfully like this: I created the alphabet array `var alphabet = ["0","1","2", etc]`, removed the character offset of 33 from the var calculations such as `enc5 = chr % 85 + 33;`, and then replaced `String.fromCharCode(var)` with `alphabet[var]`. – Jeremy Caris Mar 14 '19 at 12:15
  • I figured it out. When I simplified my array building by just splitting the full string, it outputted the correct encoding. I must have had an error in my first array of characters. I don't know why I didn't think about this method before posting my original question. Over-thinking it I guess! Thanks again! – Jeremy Caris Mar 14 '19 at 14:43