Is this a bug in the passlib base64 encoding?

Question

I am trying to decode an re-encode a bytesytring using passlibs base64 encoding:

from passlib.utils import binary

engine = binary.Base64Engine(binary.HASH64_CHARS)
s2 = engine.encode_bytes(engine.decode_bytes(b"1111111111111111111111w"))

print(s2)

This prints b'1111111111111111111111A' which is of course not what I expected. The last character is different.

Where is my mistake? Is this a bug?

score 1 · Accepted Answer · answered May 20 '21 at 22:03

No, it's not a bug.

In all variants of Base64, every encoded character represents just 6 bits and depending on the number of bytes encoded you can end up with 0, 2 or 4 insignificant bits on the end.
In this case the encoded string 1111111111111111111111w is 23 characters long, that means 23*6 = 138 bits which can be decoded to 17 bytes (136 bits) + 2 insignifcant bits.

The encoding you use here is not Base64 but Hash64

Base64 character map used by a number of hash formats; the ordering is wildly different from the standard base64 character map.

In the character map
HASH64_CHARS = u("./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")
we find A on index 12 (001100) and w on index 60 (111100)

Now the 'trick' here is, that
binary.Base64Engine(binary.HASH64_CHARS) has a default parameter big=False, which means that the encoding is done in little endian format by default.

In your example it means that w is 001111 and A is 001100. During decoding the last two bits are cut off as they are not needed as explained above. When you encode it again, A is taken as the first character in the character map that can be used two encode 0011 plus two insignifiant bits.

Is this a bug in the passlib base64 encoding?

1 Answers1