1

I want to encrypt texts using PyCrypto AES and write the output ciphertext to a text file. As you know the encrypt() output returns bytes and so I have to decode it to unicode first for it to be accepted by write() method. My problem is both str() and decode() methods throw the same error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 0: invalid continuation byte

Here's part of the encrypted text I'm trying to decode to utf-8:

...\xa5- \xd9\x14_\x02\x18\x96\xde\xbb\xad\xb57>\xe5i\x82H\x9b\xcc\x19y\x0f\x89\x0c~\x81\xb5(\xcc|6\x0b\x1c\xa3\x93E\x91d\xa4\x01\xb3\x98C\xb4,\x94@,\xb0\xb0\xd7\xe2\xf7\xf6U\x129B\xd6#u\x02\xc3\xe4l\xa3\x05V\x143\xe6\x85-\x88\x7f\x14\xc0\x1e\x1d\x19vQ\xbe\xc3\xda8\x06\xaf\xb9\xb7F\x91\xa6\xba&\xcb\xd7\xd0\x12\xed\xfd\xd3n\x06\xb6\x8fZ\xccpO\x05f\x...

DorkOrc
  • 103
  • 4
  • 9
  • Take a look [here](https://stackoverflow.com/questions/19699367/unicodedecodeerror-utf-8-codec-cant-decode-byte#19706723). – Vasilis G. Nov 17 '17 at 14:41

2 Answers2

3

If you are writing to a binary file (something like)

binfile = open('bin.out', 'wb')

wb is the key, then you can just call write.

If you are writing to a text file, you'll need to use base64 or something similar to encode in a format that can be included in text. base64 and hex are the common options.

To encode in base64 do something like

import base64
b64_string = str(base64.b64encode(bytes_obj),'utf-8')

Then use b64decode to get a string back

Sam Hartman
  • 6,210
  • 3
  • 23
  • 40
0

AES encryption can output all 1-byte hex numbers (0x00 to 0xFF), however UTF8 requires the last byte in each character be in the range 0x00 to 0x7F. This means there will be numerous AES characters that have no corresponding UTF-8 translation, which seems to be the error you're getting.

In fact, if you look closely at your sample text given, it is already in unicode-8. Look at the beginning of your example:

\xa5- \xd9\x14_

This expression contains a unicode-8 hyphen, space, and underscore. The other raw bytes shown are those for which Python couldn't find a corresponding unicode-8 translation. There is a hyphen, a

Sam Hartman
  • 6,210
  • 3
  • 23
  • 40
j15
  • 11
  • 1
  • 3
  • 1
    It's not called Unicode-8, but rather UTF-8. Also UTF-8 can use all values, but it has restrictions on what is valid. I corrected your text somewhat to reflect this. I apologize if I went too far and went beyond your intent – Sam Hartman Nov 17 '17 at 14:54