63

I am encoding the URL suffix of my application:

$url = 'subjects?_d=1';
echo base64_encode($url);

// Outputs
c3ViamVjdHM/X2Q9MQ==

Notice the slash before 'X2'.

Why is this happening? I thought base64 only outputted A-Z, 0-9 and '=' as padding?

BadHorsie
  • 14,135
  • 30
  • 117
  • 191

7 Answers7

126

No. The Base64 alphabet includes A-Z, a-z, 0-9 and + and /.

You can replace them if you don't care about portability towards other applications.

See: http://en.wikipedia.org/wiki/Base64#Variants_summary_table

You can use something like these to use your own symbols instead (replace - and _ by anything you want, as long as it is not in the base64 base alphabet, of course!).

The following example converts the normal base64 to base64url as specified in RFC 4648:

function base64url_encode($s) {
    return str_replace(array('+', '/'), array('-', '_'), base64_encode($s));
}

function base64url_decode($s) {
    return base64_decode(str_replace(array('-', '_'), array('+', '/'), $s));
}
Community
  • 1
  • 1
Artefact2
  • 7,516
  • 3
  • 30
  • 38
  • Thanks. My app will always be PHP-based, so does it matter if I replace them? – BadHorsie Jul 12 '12 at 10:13
  • 1
    As long as you do the opposite transformation before decoding in your application, no. – Artefact2 Jul 12 '12 at 10:14
  • 1
    Not such a good idea to use a different scheme than the URL safe encoding as specified in [RFC 4648](http://tools.ietf.org/html/rfc4648#page-7) – Maarten Bodewes May 25 '14 at 13:05
  • This is an old question, but I second @owlstead's comment. Just use `urldecode()` and `urlencode()` on the base64 string. If you do this, you're straying away from the standards. – Spencer D Sep 22 '14 at 15:34
  • 6
    @SpencerGrantDoak RFC 4648 *does* specify a different alphabet for base64url that can be created by just replacing characters: `+` becomes `-`, `/` becomes `_`. This is more efficient than URL encoding, which may expand the result quite a lot for certain input (with a lot of bits set to 1). – Maarten Bodewes Sep 22 '14 at 16:45
  • @owlstead, thank you very much for letting me know about that. I knew that URL encoding was of course less memory efficient and tripled the message HTTP message size, but I was unaware that the RFC stated those replacing rules as an acceptable standard. Thank you for the information. – Spencer D Sep 22 '14 at 17:02
  • Not working for me, decode generates kind of garbage. – Volatil3 Nov 24 '15 at 08:57
  • Is there any reason + and / were chosen? – Shravya Boggarapu Feb 17 '20 at 13:52
  • What question are you answering with your "No." ? – gondo Jun 10 '20 at 07:14
26

In addition to all of the answers above, pointing out that / is part of the expected base64 alphabet, it should be noted that the particular reason you saw a / in your encoded string, is because when base64 encoding ASCII text, the only way to generate a / is to have a question mark in a position divisible by three.

Daniel Bejan
  • 1,468
  • 1
  • 15
  • 39
Snorbuckle
  • 1,176
  • 1
  • 10
  • 8
7

Sorry, you thought wrong. A-Za-z0-9 only gets you 62 characters. Base64 uses two additional characters, in PHP's case / and +.

deceze
  • 510,633
  • 85
  • 743
  • 889
5

There is nothing special in that.

The base 64 "alphabet" or "digits" are A-Z,a-z,0-9 plus two extra characters + (plus) and / (slash).

You can later encode / with %2f if you want.

Igor Chubin
  • 61,765
  • 13
  • 122
  • 144
5

For base64 the valid charset is: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

the = is used as filler for the last bytes

M.

poussma
  • 7,033
  • 3
  • 43
  • 68
5

Not directly related, and enough people above have answered and explained solutions quite well.

However, going a bit outside of the scope of things. If you want readable base text, try looking into Base58. It's worth considering if you want only alphanumeric characters.

tfont
  • 10,891
  • 7
  • 56
  • 52
3

A-Z is 26 characters. 0-9 is 10 characters. = is one character. That gives a total of 37 characters, which is some way short of 64.

/ is one of the 64 characters. You can see a complete list on the wikipedia page.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335