1

Goodnight.

So, I'm developing a script in python 3 to encrypt my backup files. I am having problems with encryption when recovering the file. To exemplify exactly what I am doing, in parts, suppose an "example.txt" file whose only content is the word "Test". A 5bytes file. First point: I am using aes-everywhere for encryption. So, to read and encrypt the data in the file:

with open("example.txt", "rb") as archive:
    data = archive.read()
original_data = aes256.encrypt(data.decode(), "MyKey")
# Encrypt and overwrite:
with open("example.txt", "wb") as archive:
    archive.write(original_data)

There is a decode() because the file is read in bytes, but the function encrypt() takes a string. So far so good. Opening the file (now with 44bytes), the content will look something like this: U2FsdGVkX18mxZYHtNTojCiYaQtUMHJwXi2Hbmez950= Following the command to recover the data. It is almost identical:

with open("example.txt", "rb") as archive:
    data = archive.read()
recovered_data = aes256.decrypt(data, "MyKey")
# Decrypt and overwrite:
with open("example.txt", "wb") as archive:
    archive.write(recovered_data)

The problem starts here. In certain files, I get the following error: 'utf-8' codec can't decode byte 0xa8 in position 0: invalid start byte. I tried to use different encodings (ANSI, UTF-8, etc.), but the error persists. Topics I read:

  1. utf-8-codec-cant-decode-byte-0xa0-in-position-4276-invalid-start-byte
  2. how-to-solve-unicodedecodeerror-utf-8-codec-cant-decode-byte-0xff-in-position
  3. python3-fix-unicodedecodeerror-utf-8-codec-can-t-decode-byte-in-position
  4. how-to-fix-error-UnicodeDecodeError-utf-8-codec-cant-decode-byte

There are a few more, but they all repeat more or less the same thing, in other languages. In other files, no error occurs, but the saved data is not recovered. Strange characters appear, as if you have written random bytes and have not retrieved the originals. (The same as it appears when opening an executable in a text editor.) I imagine I am saving the bytes of the bytes of the data. I did some tests trying to use different combinations of encode()/decode() in the methods that save the files, but I was unable to recover the original data or have any very different results. Any tips? For reference, I'm using Python 3.9.5 on Fedora 34. And I'm just trying to encode/recover small files, with the following extensions:

txt, pdf, odt, xls, png, jpg, jpeg, epub, mp3, gif, doc, odp, ods, mp4

  • 2
    `data.decode()` only makes sense if data is already the valid encoding of a string. For image files, mp4, and many others this is simply not the case. That is why using an AES library that only accepts string arguments is a bad choice for python. However, you can still use it if you must by base64-encoding the bytes you've read in and then encrypting the base64 encoded string. To decrypt, reverse the process: decrypt, base64 decode, and write out the result. Consider using the [cryptography](https://cryptography.io/en/latest/) library or pycryptodome instead. – President James K. Polk May 27 '21 at 03:57
  • Thank you very much, I hadn't thought of that. I will test these libraries right now. – user16042504 May 27 '21 at 11:14
  • Both the base64 method and the pycryptodome library worked perfectly. I kept on the last option. Again, thank you very much. – user16042504 May 27 '21 at 17:34

0 Answers0