How to handle incoming "b'xxx'" string

Question

I got below info from other device:

foo = { "abc": "b'E3:DE'" }

I know "b" prefix means byte in Python 3. My intent is to convert it into a string. My Python version treats it as unicode type. I tried many ways, none work. The prefix "b" is always there and it is even considered as a character which can be uppercased.

foo = xxx.get("abc")
logger.info("1 foo type {0} against {1} isinstance(foo, unicode) {2}".format(type(foo), type(b''), isinstance(foo, unicode)))
logger.info("2 before anything {0}".format(foo))
foo1 = foo.encode("utf-8")
logger.info("3 after encode foo1 {0} type {1} upper {2}".format(foo1, type(foo1), foo1.upper()))
bar = foo.decode("utf-8")
logger.info("4 after decode bar {0} type {1} upper {2}".format(bar, type(bar), bar.upper()))

Output:

INFO|1 foo type <type 'unicode'> against <type 'str'> isinstance(foo, unicode) True
INFO|2 before anything b'E3:DE'
INFO|3 after encode foo b'E3:DE' type <type 'str'> upper B'E3:DE'
INFO|4 after decode foo b'E3:DE' type <type 'unicode'> upper B'E3:DE'

Do we have a built-in function to convert this unicode with "b" prefix into a string without "b" prefix? Or do I have to use substring to get rid of it?

Hi, I tried the methods in the link, but cannot solve the issue.. — nathan, Oct 07 '22 at 03:43
Do you actually have `{"abc": "b'E3:DE'"}`, or do you have `{"abc": b"E3:DE"}`... There's a big difference. — BeRT2me, Oct 07 '22 at 03:44
If that's true, then `b` means absolutely nothing, you just have a string that starts with `b` and has extra `'` in it as well. Take the substring. — BeRT2me, Oct 07 '22 at 03:46
Either the "other device" is improperly handling bytes (if it also runs Python) or you are improperly reading from it. — gre_gor, Oct 07 '22 at 03:53
@BeRT2me I see. May I ask why the type is "unicode" after my get operation. To substring, I have to convert unicode into string, lol — nathan, Oct 07 '22 at 03:53
@gre_gor other fields are good. Eg "oxxx": "67ceac", I suspect the other device improperly handles this particular one. — nathan, Oct 07 '22 at 03:57
Looks like the other device is converting bytes to string with `str(b)` instead of `b.decode()`. — gre_gor, Oct 07 '22 at 04:09
The proper fix would be to fix the code on the other device. — gre_gor, Oct 07 '22 at 04:14

score -2 · Answer 1 · answered Oct 07 '22 at 03:55

You are likely to have a string literally starting with "b", not the indicator of the binary. From the error message, this string seems the unicode type. So, I think this is your situation.

x = u"b'E3DE'"
x
#u"b'E3DE'"
type(x)
#<type 'unicode'>

Since "b" is literal, you need to take the substring between b' and '. One way to do this is the regular expression like below.

import re
r = re.search(r"b'([^']*)'", x)
r.group(1)
#u'E3DE'

If you want to have the string, you can use encode method.

s = r.group(1).encode()
s
#'E3DE'
type(s)
#<type 'str'>

Simply removing `b'` and `'` won't properly handle potential escaped characters. — gre_gor, Oct 07 '22 at 04:17

How to handle incoming "b'xxx'" string

1 Answers1