5

I just finished creating a huffman compression algorithm . I converted my compressed text from a string to a byte array with bytearray(). Im attempting to decompress my huffman algorithm. My only concern though is that i cannot convert my byte array back into a string. Is there any built in function i could use to convert my byte array (with a variable) back into a string? If not is there a better method to convert my compressed string to something else? I attempted to use byte_array.decode() and I get this:

print("Index: ", Index) # The Index


# Subsituting text to our compressed index

for x in range(len(TextTest)):

    TextTest[x]=Index[TextTest[x]]


NewText=''.join(TextTest)

# print(NewText)
# NewText=int(NewText)


byte_array = bytearray() # Converts the compressed string text to bytes
for i in range(0, len(NewText), 8):
    byte_array.append(int(NewText[i:i + 8], 2))


NewSize = ("Compressed file Size:",sys.getsizeof(byte_array),'bytes')

print(byte_array)

print(byte_array)

print(NewSize)

x=bytes(byte_array)
x.decode()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x88 in position 0: invalid start byte

sjakobi
  • 3,546
  • 1
  • 25
  • 43
  • You can convert it to a string by calling the [bytearray.decode()](https://docs.python.org/3/library/stdtypes.html#bytes.decode) method and supplying an encoding. For example: `byte_array.decode('ascii')`. If you leave the decoding argument out, it will default to `'utf-8'`. – martineau Nov 21 '18 at 07:15
  • Hey, I got this when i added your code: byte_array.decode('ascii') UnicodeDecodeError: 'ascii' codec can't decode byte 0x88 in position 0: ordinal not in range(128). When I removed the 'ascii' part I got:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x88 in position 0: invalid start byte – Mohamed Alremeithi Nov 23 '18 at 10:11
  • That means the data in your byte array doesn't contain valid characters in those encodings. You need to find an acceptable one. There's some [here](https://docs.python.org/3/library/codecs.html#binary-transforms) in documentation—`'hex'` might be good. You can also use `'latin1'` which maps the code points 0–255 to the bytes 0x0–0xff. Doing so will allow you to convert the result back to bytes later by using `the_string.encode('latin1')`. I first heard about doing this in [this answer](https://stackoverflow.com/a/22621777/355230) to a unrelated question (to solve a different problem). – martineau Nov 23 '18 at 10:43

1 Answers1

5

You can use .decode('ascii') (leave empty for utf-8).

>>> print(bytearray("abcd", 'utf-8').decode())
abcd

Source : Convert bytes to a string?

Dorian Turba
  • 3,260
  • 3
  • 23
  • 67