why python cryptodome using latin-1 to encode and decode string?

Question

when I review the code of Cryptodome, I found the latin-1 encoding is used with the annotation that utf-8 would cause some side-effects we don't want.

For example, the py3compat.py in Cryptodome encode and decode the string in the following encoding.

def tobytes(s):
    if isinstance(s,bytes):
        return s
    else:
        if isinstance(s,str):
            return s.encode("latin-1")
        else:
            return bytes([s])
def tostr(bs):
    return bs.decode("latin-1")

@kelalaka: OP refers to [this](http://pydoc.net/pycryptodomex/3.4.6/Cryptodome.Util.py3compat/) code — Qiu, Oct 18 '18 at 06:15
https://stackoverflow.com/questions/47968578/python3-utf-8-decode-issue — kelalaka, Oct 18 '18 at 11:58
Hi, you can find the code here: https://github.com/Legrandin/pycryptodome/blob/master/lib/Crypto/Util/py3compat.py#L104 — Nash, Oct 18 '18 at 12:04

score 1 · Accepted Answer · answered Oct 18 '18 at 16:04

The reason is probably simple. Python handles strings as bytes. By default Python 2 should be using ASCII only source code, but Latin-1 encoding may well be present. The literal representation of the bytes depends on the encoding of the source file.

So you need to use Latin-1 to be as compatible as possible with byte representations generated by older applications. Of course, it is generally best to default to UTF-8 nowadays. I strongly recommend to explicitly encode characters to UTF-8 instead of relying on any defaults.

The fact that this is present in a file called py3compat.py is not a coincidence.

why python cryptodome using latin-1 to encode and decode string?

1 Answers1