1

when I review the code of Cryptodome, I found the latin-1 encoding is used with the annotation that utf-8 would cause some side-effects we don't want.

For example, the py3compat.py in Cryptodome encode and decode the string in the following encoding.

def tobytes(s):
    if isinstance(s,bytes):
        return s
    else:
        if isinstance(s,str):
            return s.encode("latin-1")
        else:
            return bytes([s])
def tostr(bs):
    return bs.decode("latin-1")
Nash
  • 43
  • 9

1 Answers1

1

The reason is probably simple. Python handles strings as bytes. By default Python 2 should be using ASCII only source code, but Latin-1 encoding may well be present. The literal representation of the bytes depends on the encoding of the source file.

So you need to use Latin-1 to be as compatible as possible with byte representations generated by older applications. Of course, it is generally best to default to UTF-8 nowadays. I strongly recommend to explicitly encode characters to UTF-8 instead of relying on any defaults.

The fact that this is present in a file called py3compat.py is not a coincidence.

Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263