0

I've looked into similar errors on stackoverflow but nothing helped me I have implemented IDEA algorithm that takes an encoded hex data input (16 hex, which is 64 bit as IDEA plain text size) For example with utf-8 encoding/decoding:

KEY = int('006400c8012c019001f4025802bc0320', 16)
plain_text = 'HiStackO'
cryptor = IDEA(KEY)  # Initialize cryptor with 128bit key
cipher_text = cryptor.encrypt(plain_text)
deciphered_text = cryptor.decrypt(cipher_text)

Encyrpt/decrypt function are below Output:

Original text = HiStackO
Hex encoded text = 4869537461636b4f
Ciphered text = b6315c103ab29de1
Deciphered text = HiStackO

I am facing issues with some text strings for example 'thinghwr' gets decrypted/encrypted successfully but for 'thingher' I get

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 1: invalid continuation byte

I've tried latin-1 and other encoders but the result isn't the original..
As for bytes I am trying to encrypt an MP3 song file by reading 8 bytes at a time, decode and encrypt and write encryption to the new encrypted file

cryptor = IDEA(KEY)  # Initialize cryptor with 128bit key

in_file = open("song.mp3", "rb")
out_file = open("encrypted.mp3", "w")

bytes8 = in_file.read(8)

while bytes8:
    res = cryptor.encrypt(bytes8.decode("latin-1"), codec="latin-1")
    print(res)
    res = ''.join('0' * (16 - len(res))) + res
    out_file.write(res)
    bytes8 = in_file.read(8)

in_file.close()
out_file.close()

Each 'res' is 16 hex numbers which contains the encrypted/decrypted text and written to the file.
The file gets encrypted successfully with no issues.

As for decryption I am using the following method:

in_file = open("encrypted.mp3", "r")
out_file = open("decrypted.mp3", "wb")

bytes8 = in_file.read(16)
while bytes8:
    res = cryptor.decrypt(bytes8)
    print(res)
    out_file.write(res.encode())
    bytes8 = in_file.read(16)

in_file.close()
out_file.close()

During decryption after few successful decryptions the following error shows up:

line 136, in decrypt
    return bytes.fromhex(res).decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 0: invalid start byte

Line 136 which is decrypt function:

        res = self.calculate_cipher(self.dec_sub_keys, cipher_text)
        res = ''.join('0' * (16 - len(res))) + res
        return bytes.fromhex(res).decode()

I've tried different encoding but nothing works out What am I doing wrong here? I am new to Python and deal with codecs before.
Encrypt/Decrypt functions:

def encrypt(self, plain_text='', is_hex=False, codec='utf-8'):
    if not is_hex:
        plain_text = plain_text.encode(codec).hex()
    plain_text = get_bin_block(plain_text)
    return self.calculate_cipher(self.enc_sub_keys, plain_text)

def decrypt(self, cipher_text='', codec='utf-8'):
    cipher_text = get_bin_block(cipher_text)
    res = self.calculate_cipher(self.dec_sub_keys, cipher_text)
    res = ''.join('0' * (16 - len(res))) + res
    return bytes.fromhex(res).decode(codec)

get_bin_block list is a function that converts the text into 4 16-bit binary blocks for calculating encryption/decryption

Adam Ma
  • 624
  • 5
  • 16
  • "Line 136 which is decrypt function:" Is this part also your own code? – Karl Knechtel Jun 13 '20 at 22:12
  • Yes, all of it , as you can see "res = cryptor.decrypt(bytes16.decode())",the crash occurs only when bytes are used, please let me know if there is anything I can provide to help resolve the issue, and is there a other way to restore decrypted hex into its original data? – Adam Ma Jun 13 '20 at 22:21
  • Do you expect that this hex string contains valid UTF-8 data? Why are you using `.decode()`? – remram Jun 13 '20 at 22:24
  • @remram I am not really familiar with any way to do this as I'm new to using codecs and Python too, would appreciate any suggestion for alternative way – Adam Ma Jun 13 '20 at 22:38

1 Answers1

1
    res = self.calculate_cipher(self.dec_sub_keys, cipher_text)
    res = ''.join('0' * (16 - len(res))) + res
    return bytes.fromhex(res).decode()

When you calculate_cipher, you could potentially end up with any arbitrary sequence of hex digits, depending on the cipher_text.

When you then attempt to .decode() the corresponding bytes, Python tries the UTF-8 encoding by default. This encoding cannot interpret every possible byte sequence as text; some values and sequences are illegal. You say that you "tried different encoding"; but you would have to pick one that actually works for this purpose, and also make sure to use it consistently across the entire program.

The problem isn't that you're reading a binary file. The problem is that you're trying to store the encrypted data as binary, though a very convoluted system of figuring out the hex digits (as text), finding the bytes corresponding to those digits, decoding those bytes back into a string to return from the module, and then encoding them again to write to the file. If you want to generate bytes, you should just generate bytes directly. Alternately, there's nothing that says you can't write a text file representing the encrypted results from a binary file, or vice-versa - just as long as you can show that the process is reversible.

It is very important that you have a proper understanding of the fundamentals here. You cannot expect to skip steps, just get a solution for the current problem and move on with your life - you will just stumble again at the next opportunity.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • thanks, yes, writing the encryption to a text file is a good idea and I've already tried it and will use it, as for a codec in such case, do you have any suggestions of a codec? as I'm new to this topic and been staring at this issue all day – Adam Ma Jun 13 '20 at 22:38
  • `latin-1` (also named `iso-8859-1`) should do the trick. See for example https://stackoverflow.com/questions/7048745/what-is-the-difference-between-utf-8-and-iso-8859-1 . I was reluctant to mention it because it's so much better to change the general approach :) – Karl Knechtel Jun 13 '20 at 22:39
  • I've used your suggested encoding, decoding read bytes and encode with latin-1> make encryption file and decrypt and decode using latin-1, now im getting different error "ValueError: non-hexadecimal number found in fromhex() arg at position 17", can you suggest a fix or any different approach than the one i'm using? – Adam Ma Jun 13 '20 at 23:18
  • Ask a new question and include a [SSCCE](http://www.sscce.org). It's too difficult to guess what you've gotten wrong. – Karl Knechtel Jun 13 '20 at 23:51
  • I just found out that the issue also happens with plain text(UnicodeDecodeError: 'utf-8'..) for example 'thinghwr' gets decrypted/encrypted successfully but for 'thingher' I get UnicodeDecodeError, I've tried latin-1 and other encoders but the result isn't the original. I've edited the main topic and added the encrypt/decrypt functions – Adam Ma Jun 14 '20 at 10:51