23

I need to encode a string of about 1000 characters that can be any byte value (00-FF). I don't want to use Hex because it's not dense enough. the problem with base64 as I understand it is that it includes + / and = which are characters I can not tolerate in my application.

Any suggestions?

Peter Kellner
  • 14,748
  • 25
  • 102
  • 188
  • 4
    Actually, that's not a problem with base64, it's a problem with your application. – JeremyP Dec 09 '10 at 09:17
  • 1
    Like @JeremyP said. If your application can't tolerate `+` `/` and `=` then you should be very, very concerned. – robertmain Nov 24 '17 at 16:57
  • 8
    There are plenty of "applications" that cannot tolerate `/`, like in a url or filename. Sure, you can escape them, but that just adds another layer that could go wrong or introduce a security flaw. – wisbucky Nov 04 '19 at 19:29

7 Answers7

20

Base58Check is an option. It is starting to become something of a de facto standard in cryptocurrency addresses.

Basic improvements over Base64:

  • Only alphanumeric characters [0-9a-zA-Z]
  • No look-alike characters: 0OIl / 0OIl
  • No punctuation to trigger word wrap or line break in documents and emails
  • Can also select entire value with a single double click due to no punctuation.

The Bitcoin Address Utility is an implementation example; geared for Bitcoins.

Note: A novel de facto standard may not be adequate for your needs. It is unclear if the Base58Check encoding method will formalise across current protocols.

LateralFractal
  • 331
  • 2
  • 11
14

Pick your replacements. Consider some other variants: base64 Variant table from Wikipedia.

While base64 encoder/decoders are trivial, replacement subsitution can be done in a simple pre/post processing step of an existing base64 encode/decode functions (inside wrappers) -- no need to re-invent the wheel (entirely). Or, better yet, as Mr. Skeet points out, find an existing library with enough flexibility.

If you have no alternative suitable "funny" characters to choose from (perhaps all the other characters are invalid leaving only the 62 alphanumeric characters to choose from), you can always use an escape character for a very slight (~3/64?) increase in size. For instance, 0 (A) would be encoded as "AA", 62 (+) would be encoded as "AB" and 63 (/) would be encoded as "AC". This too could be done as a pre/post step if you don't want to write your own encoder/decoder from the ground-up. The disadvantage with this approach is that the ratio of output characters to input bytes is not fixed.

6

If it's just those particular characters that bother you, and you can find some other characters to use instead, then how about implementing your own custom base64 module? It's not all that difficult.

Ciaran Keating
  • 2,793
  • 21
  • 19
3

You could use Base32 instead. Less dense than Base64, but eliminates unwanted characters completely.

sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • 2
    Base32 still uses =, which he can't use... But, he could subst that for another char, only have to worry about 1, instead of 3... – LarryF Dec 09 '10 at 08:16
  • 2
    @LarryF: Padding can be omitted if length can be detected in some other way, can't it? – sharptooth Dec 09 '10 at 08:18
  • Looks like the `=` padding is optional and can be decoded just fine. Padding is only necessary if you're doing things like concatenating multiple base64 strings. For example, `echo 'foo' | base64` is `Zm9vCg==`. But if you drop the padding, it can still decode `echo 'Zm9vCg' | base64 --decode` to `foo`. – wisbucky Nov 04 '19 at 19:33
1

Sure. Why not write your own Base64 encoder/decoder, but replace those chars in your algorithm. Sure, it will not be able to be decoded with a normal decoder, but if that's not an issue, then whyt worry about it. But, you better have at least 3 other chars that ARE useable in your app to represent the +/ and ='s...

LarryF
  • 4,925
  • 4
  • 32
  • 40
  • Assuming no padding (normally =) is required, only two non-alphanumeric characters are needed. –  Dec 09 '10 at 07:43
  • Yea, but I'm not sure that's an assumption you'd want to make...Unless he KNOW for sure his data length will *ALWAYS* be the same, and then that doesen't fix it for future updates when he adds a new field or something, and all the sudden all his B64 code breaks, and he doesen't know why... – LarryF Dec 09 '10 at 08:11
1

As Ciaran says, base64 isn't terribly hard to implement - but you may want to have a look for existing libraries which allow you to specify a custom set of characters to use. I'm pretty sure there are plenty out there, but you haven't specified which platform you need this for.

Basically, you just need 65 ASCII characters which are acceptable - preferably in addition to line breaks.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 34
    How is this the accepted answer, it doesn't offer any solution. – xApple Jun 21 '17 at 13:54
  • 1
    First (and probably only) time I've ever downvoted an answer from Jon Skeet. – JamesQMurphy Jan 17 '21 at 16:06
  • 1
    @JamesQMurphy: I wouldn't answer it this way now, of course (this was over 10 years ago) - but I can't delete it as it's accepted, and there are other answers that have more detail. – Jon Skeet Jan 17 '21 at 16:16
0

base62 is essentially base64 but alphanumeric only.

Sinaesthetic
  • 11,426
  • 28
  • 107
  • 176
  • 1
    This is weird, isn't it? *64 - ['+', '/', '='].length = 61*, right? The padding character '=' is not counted? https://en.wikipedia.org/wiki/Base64#Variants_summary_table – BairDev Oct 01 '21 at 14:14
  • 1
    @BairDev correct, the padding is not counted in base64 https://en.wikipedia.org/wiki/Base64 – Sinaesthetic Oct 09 '21 at 19:03