-1

Could not find a way to decode a RFC2045 format base64 string

I am trying to decode a RFC2045 format base64 string inn Python 3 but just could not find a way to achieve the same result as org.apache.commons.codec.binary.Base64.decodeBase64.

Here is the Java code:

import static org.apache.commons.codec.binary.Base64.decodeBase64;

String str1 = "j-FH-F9__CIiwyg0o3A2mKflRBnxZSMwktDJQyvRevc";
byte[] b1 = decodeBase64(str1);
System.out.println(b1.length + " " + b1);

And the Python 3 code:

import base64
from email import base64mime

def bytes2list(bdata):
    return [b if b < 128 else b - 256 for b in bdata]

b64str = 'j-FH-F9__CIiwyg0o3A2mKflRBnxZSMwktDJQyvRevc'
b64str += "=" * ((4 - len(b64str) % 4) % 4)
b1 = base64.b64decode(b64str)
b2 = base64mime.decode(b64str)
print(len(b1), bytes2list(b1))
print(len(b2), bytes2list(b2))

The Java program output: 32 [-113, -31, 71, -8, 95, 127, -4, 34, 34, -61, 40, 52, -93, 112, 54, -104, -89, -27, 68, 25, -15, 101, 35, 48, -110, -48, -55, 67, 43, -47, 122, -9]

Python output: 29 [-116, 81, -59, -12, 34, 34, -61, 40, 52, -93, 112, 54, -104, -89, -27, 68, 25, -15, 101, 35, 48, -110, -48, -55, 67, 43, -47, 122, -9] for both base64.b64decode and base64mime.b64decode

I would expect this is not a really rare situation but just could not find a way to get it right. Any hints?

hanaZ
  • 261
  • 2
  • 13
  • `str1 = 'j-...'` is not valid in Java **and** `jshell> Base64.getDecoder().decode("j-FH-F9__CIiwyg0o3A2mKflRBnxZSMwktDJQyvRevc") | Exception java.lang.IllegalArgumentException: Illegal base64 character 2d` (minus is not valid Base64) – user85421 Mar 25 '19 at 14:42
  • being more lenient: `jshell> Base64.getMimeDecoder().decode("j-FH-F9__CIiwyg0o3A2mKflRBnxZSMwktDJQyvRevc") ==> byte[29] { -116, 81, -59, -12, 34, 34, -61, 40, 52, -93, 112, 54, -104, -89, -27, 68, 25, -15, 101, 35, 48, -110, -48, -55, 67, 43, -47, 122, -9 }` - consider using the decoder that comes with Java... – user85421 Mar 25 '19 at 14:47
  • I used Groovy to run the Java code. Sorry missed the single vs double quote issue – hanaZ Mar 26 '19 at 00:24
  • The data was encoded with RFC 2045 and only the Apache common decoded it right. Need to do the same thing with Python 3 – hanaZ Mar 26 '19 at 00:32
  • 1
    The data was not encoded per RFC 2045. It was encoded per RFC 4648 section 5, using the "URL and Filename Safe Alphabet" with '-' and '_' in place of the usual '+' and '/' characters. To decode this data in Python, use the `urlsafe_b64decode` method of the `base64` module. – ottomeister Mar 26 '19 at 00:42

1 Answers1

0

Found the answer in How to decode base64 url in python?. The code should be:

b1 = base64.urlsafe_b64decode(b64str)
hanaZ
  • 261
  • 2
  • 13