I am interested in the following:
Is there a list of characters that would never occur as part of a base 64 encoded string?
For example *
. I am not sure if this would occur or not. If the original input actually had *
as part of it would that be encoded differently?

- 18,826
- 34
- 135
- 254
-
3I would look at this page to work it out. http://en.wikipedia.org/wiki/Base64 – Peter Lawrey Nov 02 '12 at 12:24
-
1The notion that a `*` in the input would be represented as a `*` in the output is bizarre and indicates severe conceptual confusion about the relationship of the input to the output. A `*` could appear in the output if and only if it's a member of the base 64 character set ... regardless of what's in the input. – Jim Balter Jun 07 '18 at 06:41
4 Answers
Here is what I could turn up: RFC 4648
It includes this convenient table:
Table 1: The Base 64 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y
So a regular expression that matches any character that should never appear in Base 64 encodings would be:
[^A-Za-z0-9+/=]
However, as kapeps answer points out, this is only the recommendation. Specific implementations might choose a different set of 64 characters. (In fact, even the linked RFC contains an alternative table for URL and filename safe encoding, which replaces character 62 and 63 with -
and _
respectively). So I guess it really depends on the implementation that created the encoding.

- 43,427
- 11
- 90
- 130
-
4`/` being part of the standard means that this can't be used for naming files. Also, why not start with `0` before `A`? Why make the first ten numbers in the base system purposefully different? – Aaron Franke Jan 03 '19 at 08:35
-
1I can't answer your second question, but the RFC does provide an alternative encoding that doesn't use `/` and `+` and is specifically designed to be safe for filenames and URLs. – Martin Ender Jan 03 '19 at 12:02
-
2@MartinEnder By the way, a more appropriate regular expression would be `^[A-Za-z0-9+/]+={0,2}$`. – Victor May 18 '19 at 08:33
-
Is there any python function which can return the value of the encoding ? for example, something like base64('A') = 0, base64('O') = 14 – Praveen Parihar Sep 10 '19 at 09:38
-
@Praveen, no, because in base64 3 8-bit ASCII characters change into 4 6-bit base64 characters (24 total bits). The encoding of any character will depend on the character before or after it in the original string. – Foo Bar Sep 27 '22 at 13:38
-
You are probably safe with the other answers in most situations, but according to the Wikipedia article on Base64 there shouldn't be a definite list you can rely on:
The particular choice of character set selected for the 64 characters required for the base varies between implementations.
RFC 4648 mentions other alphabets, such as the "URL and Filename safe" Base 64 Alphabet, where +
and /
are replaced with -
and _
.
There's a table of Base64 variants which use different characters. Keep in mind that there are implementation specific rules about line separators, which you can find in the same table. Some implementations like Mime even allow (and ignore) characters that are not in the alphabet.

- 28,903
- 6
- 107
- 121
Base64 only contains A–Z
, a–z
, 0–9
, +
, /
and =
.
So the list of characters not to be used is: all possible characters minus the ones mentioned above.
For special purposes .
and _
are possible, too.

- 37,490
- 6
- 58
- 83
-
1Is `=` included in standard base64? It looks like that brings the total number of characters to 65. What is your source for this? edit: it looks like the `=` is for padding, in case data is complete before a frame is finished. – Caleb Hensley Jun 30 '22 at 17:08
https://en.wikipedia.org/wiki/Base64#Design
MIME's Base64 implementation uses A–Z, a–z, and 0–9 for the first 62 values
So for the most part you should expect only alphanumeric characters. The example table in this article shows '+' and '-' also; it's unlikely you would see '*'.
You can use http://www.motobit.com/util/base64-decoder-encoder.asp to convert to Base64 for example, and for '*' this returns "Kg=="

- 4,612
- 2
- 29
- 39