There's one problem in your question: There are several encodings for Base64, depending on the extra nonalphanumeric characters used in the string.
Base64 encodings use the set of all uppercase ASCII chars, all lowercase, digits (this makes 26 + 26 + 10 = 62 chars) and two more, that can be (depending on what are you using base64 encodings for) {'+', '/'}
, {'.', '-'}
, {'.', '_'}
and some other (see here for a thorough explanation).
Another issue is that normally, on long Base64 strings, line length is restricted to 76 chars, so base64 strings have interspersed newlines (some with/without the \r
of the CRLF
pair), until the final line, that can have one, or two '='
chars.
Also, some (not all) base64 strings finish with one or two '='
chars, depending on the total number of chars used (mod 4) (this is not optional, but some encodings --e.g. for urls-- don't use the final equal signs)
If you are pretending to parse +/
(as for mime encoding use) then a valid (and strict) regex for base64 can be:
(((\r?\n|\s)*[A-Za-z0-9+\/]){4})*(((\r?\n|\s)*[A-Za-z0-9+\/]){2}((\r?\n|\s)*=){2}|((\r?\n|\s)*[A-Za-z0-9+\/]){3}((\r?\n|\s)*=){1})?
but think twice before using it, as it will match the longest base64 string possible (because it cannot analyse the context to match) and ignore any extra chars behind it, so for an invalid base64 string like:
ABCDE
(has 5 characters, while base64 has to be multiple of four characters, including the final '='
s), it will match the first four ("ABCD"
as a valid base64, as the longest base64 string possible to match (for that string to be valid, it should have been encoding as ABCDEA==
, (assuming the missing two bits of the last byte are zeros). See the demo above for a sample of this. Also the empty string is matched (it is a valid zero length base64 string)
NOTE
A good base64 decoder not only will parse the string the same way as the regex matcher does, but will also produce the binary string represented on it (with less than very low effort) so I recommend you not to use (in this case) a regex matcher, but only as an exercise, or perhaps for a javascript validator in the client browser, to check format before sending base64 encoded strings to a server, that will need also to decode it again)
NOTE 2
The next is a good test to check for base64 strings: It forces to allow only whitespace between the beginning of the line and the base64 encoded string, and from the end of the encoded string and the end of the line (making the base64 encoding to be forced to use its own lines) This will make it a stronger test:
^(((\r?\n|\s)*[A-Za-z0-9+\/]){4})*(((\r?\n|\s)*[A-Za-z0-9+\/]){2}(=(\r?\n|\s)*){2}|((\r?\n|\s)*[A-Za-z0-9+\/]){3}(=(\r?\n|\s)*))?$
See demonstration here