Goodnight.
So, I'm developing a script in python 3 to encrypt my backup files. I am having problems with encryption when recovering the file. To exemplify exactly what I am doing, in parts, suppose an "example.txt" file whose only content is the word "Test". A 5bytes file. First point: I am using aes-everywhere for encryption. So, to read and encrypt the data in the file:
with open("example.txt", "rb") as archive:
data = archive.read()
original_data = aes256.encrypt(data.decode(), "MyKey")
# Encrypt and overwrite:
with open("example.txt", "wb") as archive:
archive.write(original_data)
There is a decode() because the file is read in bytes, but the function encrypt() takes a string. So far so good. Opening the file (now with 44bytes), the content will look something like this: U2FsdGVkX18mxZYHtNTojCiYaQtUMHJwXi2Hbmez950= Following the command to recover the data. It is almost identical:
with open("example.txt", "rb") as archive:
data = archive.read()
recovered_data = aes256.decrypt(data, "MyKey")
# Decrypt and overwrite:
with open("example.txt", "wb") as archive:
archive.write(recovered_data)
The problem starts here. In certain files, I get the following error: 'utf-8' codec can't decode byte 0xa8 in position 0: invalid start byte. I tried to use different encodings (ANSI, UTF-8, etc.), but the error persists. Topics I read:
- utf-8-codec-cant-decode-byte-0xa0-in-position-4276-invalid-start-byte
- how-to-solve-unicodedecodeerror-utf-8-codec-cant-decode-byte-0xff-in-position
- python3-fix-unicodedecodeerror-utf-8-codec-can-t-decode-byte-in-position
- how-to-fix-error-UnicodeDecodeError-utf-8-codec-cant-decode-byte
There are a few more, but they all repeat more or less the same thing, in other languages. In other files, no error occurs, but the saved data is not recovered. Strange characters appear, as if you have written random bytes and have not retrieved the originals. (The same as it appears when opening an executable in a text editor.) I imagine I am saving the bytes of the bytes of the data. I did some tests trying to use different combinations of encode()/decode() in the methods that save the files, but I was unable to recover the original data or have any very different results. Any tips? For reference, I'm using Python 3.9.5 on Fedora 34. And I'm just trying to encode/recover small files, with the following extensions:
txt, pdf, odt, xls, png, jpg, jpeg, epub, mp3, gif, doc, odp, ods, mp4