Note: I don't know much about Encoding / Decoding, but after I ran into this problem, those words are now complete jargon to me.
Question:
I'm a little confused here. I was playing around with encoding/decoding images, to store an image as a TextField
in a django model, looking around Stack-Overflow I found I could decode an image from ascii
(I think or binary? Whatever open('file', 'wb')
uses as encoding. I'm assuming the default ascii
) to latin1
and store it in a database with no problems.
The problem comes from creating the image from the latin1
decoded data. When attempting to write to a file-handle I get a UnicodeEncodeError
saying ascii
encoding failed.
I think the problem is when opening a file as binary data (rb
) it's not a proper ascii
encoding, because it contains binary data. Then I decode the binary data to latin1
but when converting back to ascii
(auto encodes when trying to write to the file), it fails, for some unknown reason.
My guess is either that when decoding to latin1
the raw binary data get converted to something else, then when trying to encode back to ascii
it can't identify what was once raw binary data. (although the original and decoded data have the same length).
Or the problem lies not with the decoding to latin1
but that I'm attempting to ascii encode binary data. In which case how would I encode the latin1
data back to an image.
I know this is very confusing but I'm confused on it all, so I can't explain it well. If anyone can answer this question there probably a riddle master.
some code to visualize:
>>> image_handle = open('test_image.jpg', 'rb')
>>>
>>> raw_image_data = image_handle.read()
>>> latin_image_data = raw_image_data.decode('latin1')
>>>
>>>
>>> # The raw data can't be processed by django
... # but in `latin1` it works
>>>
>>> # Analysis of the data
>>>
>>> type(raw_image_data), len(raw_image_data)
(<type 'str'>, 2383864)
>>>
>>> type(latin_image_data), len(latin_image_data)
(<type 'unicode'>, 2383864)
>>>
>>> len(raw_image_data) == len(latin_image_data)
True
>>>
>>>
>>> # How to write back to as a file?
>>>
>>> copy_image_handle = open('new_test_image.jpg', 'wb')
>>>
>>> copy_image_handle.write(raw_image_data)
>>> copy_image_handle.close()
>>>
>>>
>>> copy_image_handle = open('new_test_image.jpg', 'wb')
>>>
>>> copy_image_handle.write(latin_image_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
>>>
>>>
>>> latin_image_data.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
>>>
>>>
>>> latin_image_data.decode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)