Difference between bytes.fromhex() and encode()

Question

key = '140b41b22a29beb4061bda66b6747e14' # hex-encoded

>>> bytes.fromhex(key)
b'\x14\x0bA\xb2*)\xbe\xb4\x06\x1b\xdaf\xb6t~\x14'

This seems to be correct as the code which I wrote for CBC (cipher) works after this.

The code below was inspired from this site.

>>> "".join([chr(int(key[i:i+2],16)) for i in range(0,len(key),2)]).encode()
b'\x14\x0bA\xc2\xb2*)\xc2\xbe\xc2\xb4\x06\x1b\xc3\x9af\xc2\xb6t~\x14'

So, my question is: Why is the output different in both the cases and more importantly how come the length has increased from 16 bytes to 21 bytes in the 2nd case?

Martijn Pieters · Accepted Answer · 2017-12-09T09:46:10.080

1

You encoded the text representation of the hex values to UTF-8 (the default encoding if you don't specify one). For example, the B2 hex value is converted to a Unicode codepoint U+00B2, which encodes to UTF-8 as C2 B2.

You need to encode as Latin-1 if you want matching bytes for the Unicode codepoints:

>>> "".join([chr(int(key[i:i+2],16)) for i in range(0,len(key),2)]).encode('latin1')
b'\x14\x0bA\xb2*)\xbe\xb4\x06\x1b\xdaf\xb6t~\x14'

The first 256 codepoints of Unicode correspond with the Latin-1 standard, so U+00B2 encodes directly to B2 in binary.

If you wanted to convert hex bytes to integers, do not create Unicode text. Just pass the integers directly to bytes:

>>> bytes(int(key[i:i + 2], 16) for i in range(0, len(key), 2))
b'\x14\x0bA\xb2*)\xbe\xb4\x06\x1b\xdaf\xb6t~\x14'

That way you don't have to translate back from Unicode to bytes.

edited Dec 09 '17 at 09:46

answered Dec 09 '17 at 09:33

Martijn Pieters

1,048,767
296
4,058
3,343

Can you point me towards how *B2 hex value is converted to a Unicode codepoint U+00B2*? I think this is all I need to understand. – Miraj50 Dec 09 '17 at 09:41
@Miraj50: `chr()` produces Unicode text, not bytes. So `chr(int('B2', 16))` produces a Unicode character that in the Unicode standard is referred to as [U+00B2 SUPERSCRIPT TWO](https://codepoints.net/U+00B2). – Martijn Pieters Dec 09 '17 at 09:43

Difference between bytes.fromhex() and encode()

1 Answers1