18

My project at work is using the Jackson JSON serializer to convert a bunch of Java objects into Strings in order to send them to REST services.

Some of these objects contain sensitive data, so I've written custom serializers to serialize these objects to JSON strings, then gzip them, then encrypt them using AES;

This turns the strings into byte arrays, so I use the Base64 encoder in Apache commons codec to convert the byte arrays into strings. The custom deserializers behind the REST interfaces reverse this process:

base64 decode -> decrypt -> decompress -> deserialize using default Jackson deserializer.

Base64 encoding increases the size of the output (the gzip step in serialization is meant to help ameliorate this increase), so I checked Google to see if there was a more efficient alternative, which led me to this previous stackoverflow thread that brought up Ascii85 encoding as a more efficient alternative -

Base64 adds 33% to the size of the output, Ascii85 adds 25% to the size of the output.

I found a few Java Ascii85 implementations e.g. Apache pdfbox, but I'm a bit leery to use the encoding - it seems like hardly anybody is using or implementing it, which might just mean that Base64 has more inertia, or which may instead mean that there's some wonky problem with Ascii85.

Does anybody know more on this subject? Are there any problems with Ascii85 that mean that I should use Base64 instead?

Community
  • 1
  • 1
Zim-Zam O'Pootertoot
  • 17,888
  • 4
  • 41
  • 69
  • 2
    Why not just invoking the rest services through HTTPS of using home-made encryption on parts of the message ? – Samuel Rossille Nov 15 '12 at 21:08
  • We're using HTTPS for the REST calls, the reason we're encrypting the data is because most of the messages are also spending time in an Amazon Web Services Simple Queueing Service queue, which only accepts strings. The people with access to the queue are not the same people who have access to the encryption keys. – Zim-Zam O'Pootertoot Nov 15 '12 at 21:22

3 Answers3

19

Base64 is way more common. The difference in size really isn't that significant in most cases, and if you add at the HTTP level (which will compress the base64) instead of within your payload, you may well find the difference goes away entirely.

Are there any problems with Ascii85 that mean that I should use Base64 instead?

I would strongly advise using base64 just because it's so much more widespread. It's pretty much the canonical way of representing binary data as text (unless you want to use hex, of course).

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 5
    More common does not mean it's better. And if it's less overhead in data size - may be it should be used instead. 8% of data transfer saving for 1 Gb results in 85 MB traffic saving. But that of course depends how much data you need to transfer. But software requirements do change, so does data amounts, but file format changing afterwards can be quite painful experience. But I suspect that this is not these encoding are not used for heavy data transfer. – TarmoPikaro Feb 26 '16 at 23:28
  • 2
    @TarmoPikaro: More common means "more likely to be implemented correctly in every platform you're likely to need it for" though. And yes, if you're transferring huge amounts of data, you'd be better off working out a way to avoid performing a text encoding at all - saving both bandwidth *and* CPU encoding/decoding time. – Jon Skeet Feb 26 '16 at 23:32
  • Typically implementation begins by having reference implementation, so if we have one working solution, it will most likely will be cloned to more. What I have briefly checked over wiki pages - both - ascii 85 and mime 64 don't give visually any advantage over another. But then it's how easy it's to use one or another encoding - for example in windows there are CryptBinaryToString built-in function which can generate mime64 right away. – TarmoPikaro Feb 26 '16 at 23:51
  • (I reverted the edit that used code font for non-code items. It generally makes things much harder to read.) – Jon Skeet Sep 27 '19 at 10:26
9

ASCII85 is a nice encoding to use to save that extra bit of space. But it outputs many characters that would need to be escaped if naively sent over HTTP. Base64 encoding has a variant that can be sent over HTTP without any escaping.

Here's a javascript ASCII85 encoder in case anyone needs to try:

// By Steve Hanov. Released to the public domain.
function encodeAscii85(input) {
  var output = "<~";
  var chr1, chr2, chr3, chr4, chr, enc1, enc2, enc3, enc4, enc5;
  var i = 0;

  while (i < input.length) {
    // Access past the end of the string is intentional.
    chr1 = input.charCodeAt(i++);
    chr2 = input.charCodeAt(i++);
    chr3 = input.charCodeAt(i++);
    chr4 = input.charCodeAt(i++);

    chr = ((chr1 << 24) | (chr2 << 16) | (chr3 << 8) | chr4) >>> 0;

    enc1 = (chr / (85 * 85 * 85 * 85) | 0) % 85 + 33;
    enc2 = (chr / (85 * 85 * 85) | 0) % 85 + 33;
    enc3 = (chr / (85 * 85) | 0 ) % 85 + 33;
    enc4 = (chr / 85 | 0) % 85 + 33;
    enc5 = chr % 85 + 33;

    output += String.fromCharCode(enc1) +
      String.fromCharCode(enc2);
    if (!isNaN(chr2)) {
      output += String.fromCharCode(enc3);
      if (!isNaN(chr3)) {
        output += String.fromCharCode(enc4);
        if (!isNaN(chr4)) {
          output += String.fromCharCode(enc5);
        }
      }
    }
  }

  output += "~>";

  return output;
}
<input onKeyUp="result.innerHTML = encodeAscii85(this.value)" placeholder="write text here" type="text">
<p id="result"></p>
Qwerty
  • 29,062
  • 22
  • 108
  • 136
Steve Hanov
  • 11,316
  • 16
  • 62
  • 69
  • I never wrote a decoder because I didn't need it for my application. – Steve Hanov Jun 02 '16 at 16:53
  • I think you missed a leading `<~` there. I couldn't decode the string without it. Am I wrong? Also I modified your answer and added a snippet ;) – Qwerty Jun 02 '16 at 23:52
  • Ironically this broke when I tried to encode "`Can we have decode too? :)`" It starts getting wonky at `Can we ha` EDIT: somethings odd when running in that frame, but the encoder function itself works properly – enorl76 Dec 28 '18 at 20:18
  • You don't need to fall back to the more inefficent 1950-ties backwards compatible 6-Bit encoding. With Z85 (https://www.johndcook.com/blog/2019/03/05/base85-encoding/) you can be HTTP friendly too. Open source Base-Z85 en-/decoders for common languages are easy to find. – Jan Jul 14 '22 at 10:07
3

Here is matching ASCII85 AKA Base85 decoder (for user Qwerty) in JavaScript:

function decode_ascii85(a) {
  var c, d, e, f, g, h = String, l = "length", w = 255, x = "charCodeAt", y = "slice", z = "replace";
  for ("<~" === a[y](0, 2) && "~>" === a[y](-2), a = a[y](2, -2)[z](/\s/g, "")[z]("z", "!!!!!"), 
  c = "uuuuu"[y](a[l] % 5 || 5), a += c, e = [], f = 0, g = a[l]; g > f; f += 5) d = 52200625 * (a[x](f) - 33) + 614125 * (a[x](f + 1) - 33) + 7225 * (a[x](f + 2) - 33) + 85 * (a[x](f + 3) - 33) + (a[x](f + 4) - 33), 
  e.push(w & d >> 24, w & d >> 16, w & d >> 8, w & d);
  return function(a, b) {
    for (var c = b; c > 0; c--) a.pop();
  }(e, c[l]), h.fromCharCode.apply(h, e);
}
<input onKeyUp="result.innerHTML = decode_ascii85(this.value)" placeholder="insert encoded string here" type="text">
<p id="result"></p>
example: <xmp><~<+oue+DGm>@3BW*D/a<&+EV19F<L~></xmp>
Community
  • 1
  • 1
Dave Brown
  • 923
  • 9
  • 6
  • 1
    This doesn't answer the question. – james.garriss Dec 17 '15 at 17:22
  • @james.garriss This allows the OP to test both in a "side by side" manner for his context or implementation, which in turn should answer the question ("Base64 encoding vs Ascii85 encoding" or "Are there any problems with Ascii85 that mean that I should use Base64 instead?"). --- The only other question presented, "Does anybody know more on this subject?", falls into the "too broad" category. --- In other words, I think this is perfectly valid as an answer. Although actual explanations would have been a better answer. – CosmicGiant Aug 28 '16 at 17:44