0

I'm new to python 3 and trying to extract a message from a bytes array that contains both string and bytes in the message.

I'm unable to extract the bytes message from the decoded bytes array.

  1. Firstly, I decode the bytes array.
  2. Then I do a split on the decoded array.
  3. I get string values upon splitting the array.

I tried to use bytes(v) for v in rest.split() function to try and get the bytes array and then decode it, but wasn't able to.

# The message chunk:
chunk = b"1568077849\n522\nb'l5:d4:auth53:\xc3\x99\xc3\xac\x1fH\xc2\xa3ei6eli1eee'\n"

# I split the chunk into sub categories for further processing:
_, size, rest = (chunk.decode("utf-8")).split('\n', 2)

# _ contains "1568077849"
# size contains "522" 
# rest contains "b'l5:d4:auth53:\xc3\x99\xc3\xac\x1fH\xc2\xa3ei6eli1eee'"

I'm supposed to be able to decode the rest variable (rest.decode("utf-8")), but since it's getting stored as string, I'm having a hard time figuring out how can I convert that to bytes and then decode the value.

The expected result: l5:d4:auth53:ÙìH£ei6eli1eee

Alok Nath Saha
  • 297
  • 1
  • 4
  • 20
  • how did you get this string ? It seems someone create this string in wrong way. – furas Sep 10 '19 at 02:29
  • 1
    You could use slicing `rest = rest[2:-2]` – nathancy Sep 10 '19 at 02:32
  • It's coming in from a server ```request = reader._build_request(chunk_meta) chunk = urllib.request.urlopen(request).read()``` – Alok Nath Saha Sep 10 '19 at 02:33
  • as @nathancy mentioned you have to slice it and then you should have correct string `l5:d4:auth53:ÙìH£ei6eli1eee` – furas Sep 10 '19 at 02:38
  • @nathancy I need to be able to get the value in bytes to be able to decode it correctly. Currently, your solution is still giving rest as a string ``` print(isinstance(rest, bytes)) -> False ``` – Alok Nath Saha Sep 10 '19 at 02:38
  • I don't fully understand, maybe this?? `rest = bytes(rest[2:-2].encode('utf-8')) `. Then the type is bytes `print(isinstance(rest, bytes))` -> `True` – nathancy Sep 10 '19 at 02:49
  • @nathancy The rest is getting stored as a string object ```l5:d4:auth53:\xc3\x99\xc3\xac\x1fH\xc2\xa3ei6eli1eee```, so if we encode it, its becoming a byte object ```b'l5:d4:auth53:\\xc3\\x99\\xc3\\xac\\x1fH\\xc2\\xa3ei6eli1eee'```. Which if we decode doesn't give the expected answer ```l5:d4:auth53:ÙìH£ei6eli1eee``` – Alok Nath Saha Sep 10 '19 at 02:55

2 Answers2

2

This will print your final result:

chunk = b"1568077849\n522\nb'l5:d4:auth53:\xc3\x99\xc3\xac\x1fH\xc2\xa3ei6eli1eee'\n"

l1 = chunk.decode('utf-8').split()[2:]  # Initial decode
#  slice out the embedded byte string "b'  '" characters
l1_string = ''.join([x[:-2] if x[0] != 'b' else x[2:] for x in l1])
l1_bytes = l1_string.encode('utf-8')
l1_final = l1_bytes.decode('utf-8')

print('Results')
print(f'l1_string is {l1_string}')
print(f'l1_bytes is {l1_bytes}')
print(f'l1_final is {l1_final}')
Results
l1_string is l5:d4:auth53:ÙìH£ei6eli1ee
l1_bytes is b'l5:d4:auth53:\xc3\x99\xc3\xacH\xc2\xa3ei6eli1ee'
l1_final is l5:d4:auth53:ÙìH£ei6eli1ee
DaveStSomeWhere
  • 2,475
  • 2
  • 22
  • 19
  • Thanks Dave !! Took sometime to decode the answer :) – Alok Nath Saha Sep 10 '19 at 03:41
  • Dave, I just came across the data and its coming in as: ```chunk = b"1568077849\n522\nb'l5:d4:auth53:\\xc3\\x99\\xc3\\xac\\x1fH\\xc2\\xa3ei6eli1eee'\n"```. There are two double slashes in the input. So the above solution is not working as expected. – Alok Nath Saha Sep 10 '19 at 04:09
0

I was able to get the expected output this way:

 _, size, rest = (chunk.decode("utf-8")).split('\n', 2)
 rest = bytes(rest.replace("b'", "").replace("'", ""), "utf-8").decode("unicode_escape")

Got the clue from this post: Process escape sequences in a string in Python

Alok Nath Saha
  • 297
  • 1
  • 4
  • 20