0

How can I convert bytes to string without changing data ?
E.g
Input:
file_data = b'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'

Output:
'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'

I want to write an image data using StringIO with some additional data, Below is my code snippet,

img_buf = StringIO()
f = open("Sample_image.jpg", "rb")
file_data = f.read()
img_buf.write('\r\n' + file_data + '\r\n')

This works fine with python 2.7 but I want it to be working with python 3.4.
on read operation file_data = f.read() returns bytes object data something like this

b'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'  

While writting data using img_buf it accepts only String data, so unable to write file_data with some additional characters. So I want to convert file_data as it is in String object without changing its data. Something like this

'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'  

so that I can concat and write the image data.

I don't want to decode or encode data. Any suggestions would be helpful for me. thanks in advance.

Bharat Jogdand
  • 438
  • 3
  • 16
  • 2
    Are you asking how to convert the bytes data to a string? Just `my_string = file_data.decode('utf-8')`? (Realize that decoding is literally converting bytes to a string... are you **sure** you don't want to decode it?) – Scott Mermelstein Mar 23 '18 at 15:35
  • `my_string = file_data.decode('utf-8')` Gives error as `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 0: invalid start byte` – Bharat Jogdand Mar 23 '18 at 15:38
  • I won't try to close this as a duplicate, because I don't yet understand what you want, but does https://stackoverflow.com/questions/13837848/converting-byte-string-in-unicode-string?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa solve your problem? – Scott Mermelstein Mar 23 '18 at 15:40
  • No it doesn't solve my problem. I just want bytes data to string as it is without changing bytes data. – Bharat Jogdand Mar 23 '18 at 15:45
  • 1
    Actually, I'll step it back a bit. Please read [ask]. Your title mentions "bytes data of an image". This is presumably 64-bit encoded image data that you're loading. With the appropriate functions, you can convert this data to an image. Why do you want it to be a string? Can you please [edit] your question to tell us what exactly you're trying to do with it? – Scott Mermelstein Mar 23 '18 at 15:45
  • Yes sure @ScottMermelstein – Bharat Jogdand Mar 23 '18 at 15:55
  • 1
    You'll need to explain better what you mean by "without changing data". The data doesn't change; simply the way it is interpretted does when you use decode. I'll just leave this here as recommended reading, and wish you good luck. https://docs.python.org/3.3/howto/unicode.html – Scott Mermelstein Mar 23 '18 at 15:55
  • Thanks @ScottMermelstein – Bharat Jogdand Mar 23 '18 at 15:57

2 Answers2

0

It is not clear what kind of output you desire. If you are interested in aesthetically translating bytes to a string representation without encoding:

s = str(file_data)[1:]
print(s)
# '\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'

This is the informal string representation of the original byte string (no conversion).


Details

The official string representation looks like this:

s
# "'\\xb4\\xeb7s\\x14q[\\xc4\\xbb\\x8e\\xd4\\xe0\\x01\\xec+\\x8f\\xf8c\\xff\\x00 \\xeb\\xff'"

String representation handles how a string looks. Double escape characters and double quotes are implicitly interpreted in Python to do the right thing so that the print function outputs a formatted string.

String intrepretation handles what a string means. Each block of characters means something different depending on the applied encoding. Here we interpret these blocks of characters (e.g. \\xb4, \\xeb, 7, s) with the UTF-8 encoding. Blocks unrecognized by this encoding are replaced with a default character, �:

file_data.decode("utf-8", "replace")
# '��7s\x14q[���\x01�+��c�\x00 ��'

Converting from bytes to strings is required for reliably working with strings.

In short, there is a difference in string output between how it looks (representation) and what it means (interpretation). Clarify which you prefer and proceed accordingly.

Addendum

If your question is "how do I concatenate a byte string?", here is one approach:

buffer = io.BytesIO()
with buffer as f:
    f.write(b"\r\n")
    f.write(file_data)
    f.write(b"\r\n")
    print(buffer.getvalue())
# b'\r\n\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff\r\n'

Equivalently:

buffer = b""
buffer += b"\r\n"
buffer += file_data
buffer += b"\r\n"
buffer
# b'\r\n\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff\r\n'
pylang
  • 40,867
  • 14
  • 129
  • 121
  • I tried both of ways you mentioned in answer. `file_data = str(file_data)[2:-1]` returns String like `'\\xb4\\xeb7s\\x14q[\\xc4\\xbb\\x8e\\xd4\\xe0\\x01\\xec+\\x8f\\xf8c\\xff\\x00 \\xeb\\xff'` which contains escaped backslashes and I dont want escaped backslashes. I have added brief explanation in question for more information. Thanks @pylang – Bharat Jogdand Mar 23 '18 at 22:37
  • `s` has escaped characters because it is a string. To my knowledge, you cannot change that (and probably should not try). It seems like you just want to append data to a byte string, correct? If so, is it necessary to have a string result? How about modifying a byte string instead? – pylang Mar 24 '18 at 00:10
-2

The image data obviously isn't utf-8 encoded, or other encodings. The image are raw data, not any form of text.

BUT there is an encoding that mantains all the characters with code from 0 to 255:

data = data.decode("latin1")

This changes the data type from bytes to str.

It isn't a brilliant solution because it consumes cpu time and memory, creating a new object, but it is the only one.

It is a nuisance there isn't an instruction in Python to just change the data type, from bytes to str, without processing.

Massimo
  • 3,171
  • 3
  • 28
  • 41