13

Python 3.4 added the a85encode and b85encode functions (and their corresponding decoding functions).

What is the difference between the two? The documentation mentions "They differ by details such as the character map used for encoding.", but this seems unnecessarily vague.

orlp
  • 112,504
  • 36
  • 218
  • 315

2 Answers2

15

a85encode uses the character mapping:

!"#$%&'()*+,-./0123456789:;<=>?@
ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstu

with z used as a special case to represent four zero bytes (instead of !!!!!).

b85encode uses the character mapping:

0123456789
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
!#$%&()*+-;<=>?@^_`{|}~

with no special abbreviations.


If you have a choice, I'd recommend you use a85encode. It's a bit easier (and more efficient) to implement in C, as its character mapping uses all characters in ASCII order, and it's slightly more efficient at storing data containing lots of zeroes, which isn't uncommon for uncompressed binary data.

l'L'l
  • 44,951
  • 10
  • 95
  • 146
  • How is Ascii85 more efficient than Base85 for storing zeros exactly? – l'L'l Jan 24 '16 at 00:36
  • `a85encode(b'\0\0\0\0')` is `b'z'`. `b85encode(b'\0\0\0\0')` is `b'00000'`. –  Jan 24 '16 at 00:41
  • That's nice for zeros! but what if you need to encode `z`? I would imagine with Ascii85 you'd be doing `\x7A` quite a bit. I was under the impression the two encodings had the same efficiency overall (4/5). – l'L'l Jan 24 '16 at 01:59
  • 2
    @l'L'l I think you're confused. Ascii85 doesn't pass characters through unchanged; `z` isn't used in the character mapping. (It ends at `u`; see above.) So there's no escaping. –  Jan 24 '16 at 05:39
  • @l'L'l You've totally misconstrued the way the mapping works. All bytes, including 0x7a, can be encoded by Ascii85 and Base85. Each one uses a set of 85 *output* characters to encode those bytes in an expanded form. –  Jan 24 '16 at 08:33
9

Ascii85 is the predecessor of Base85; the primary difference between the two is in-fact the character sets that are used.

Ascii85 uses the character set:

ASCII 33 ("!") to ASCII 117 ("u") 

Base85 uses the character set:

0–9, A–Z, a–z, !#$%&()*+-;<=>?@^_`{|}~

These characters are specifically not included in Base85:

"',./:[]\\

a85encode and b85encode encode/decode Ascii85 and Base85 respectively.

l'L'l
  • 44,951
  • 10
  • 95
  • 146
  • I know this is a late response, but this explanation is seriously confused. Ascii85 encoding was introduced in RFC1924 as an encoding for IPv6 addresses -- however, **this was an April Fool's joke**. It was never intended to be implemented, and has no practical benefits. –  Aug 17 '18 at 01:55
  • Base85 encoding, on the other hand, was first implemented as part of the `btoa` command-line utility, which was published on USENET sometime before 1990. It clearly cannot have been influenced by RFC1924 (which wasn't published until 1996), nor by 21st-century creations like JSON or git. –  Aug 17 '18 at 02:00
  • 1
    @duskwuff: There is no confusion; I had mentioned last time the misinformation out there, and since it was promptly disregarded I included some it in my answer to make a point. Despite RFC 1924 being a joke, the premise of the encoding isn't entirely impractical in the least — Base85 uses the same character set! Interestingly there some examples which have shown [it can work](https://codegolf.stackexchange.com/questions/70277/implement-an-encoder-for-rfc-1924-ipv6-addresses). – l'L'l Aug 17 '18 at 05:28