149

I have read in an XML email attachment with

bytes_string=part.get_payload(decode=False)

The payload comes in as a byte string, as my variable name suggests.

I am trying to use the recommended Python 3 approach to turn this string into a usable string that I can manipulate.

The example shows:

str(b'abc','utf-8')

How can I apply the b (bytes) keyword argument to my variable bytes_string and use the recommended approach?

The way I tried doesn't work:

str(bbytes_string, 'utf-8')
sjakobi
  • 3,546
  • 1
  • 25
  • 43
DjangoTango
  • 1,493
  • 2
  • 10
  • 4
  • 3
    Does this answer your question? [Convert bytes to a string](https://stackoverflow.com/questions/606191/convert-bytes-to-a-string) – Josh Correia Oct 21 '20 at 22:59

4 Answers4

250

You had it nearly right in the last line. You want

str(bytes_string, 'utf-8')

because the type of bytes_string is bytes, the same as the type of b'abc'.

ndmeiri
  • 4,979
  • 12
  • 37
  • 45
Toby Speight
  • 27,591
  • 48
  • 66
  • 103
  • 13
    `str(bytes_string, 'utf-8', 'ignore')` Errors can be ignored by passing the third parameter. – Shubhamoy Jun 08 '18 at 06:14
  • 3
    That looks like it should be a comment to [pylang's answer](/a/49457333) (which addresses handling invalid input). If (you believe that) there's nothing wrong with `bytes_string`, why would you want to ignore errors? – Toby Speight Jun 18 '18 at 08:36
  • 5
    I am getting following error with your approach: `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 0: invalid start byte` for the following bytes string `b'\xbf\x8cd\xba\x7f\xe0\xf0\xb8t\xfe.TaFJ\xad\x100\x07p\xa0\x1f90\xb7P\x8eP\x90\x06)0'` @TobySpeight – alper Feb 28 '19 at 08:41
  • Well @alper, that's not a valid UTF-8 string, so what did you expect? – Toby Speight Feb 28 '19 at 09:58
58

Call decode() on a bytes instance to get the text which it encodes.

str = bytes.decode()
uname01
  • 1,221
  • 9
  • 9
13

How to filter (skip) non-UTF8 charachers from array?

To address this comment in @uname01's post and the OP, ignore the errors:

Code

>>> b'\x80abc'.decode("utf-8", errors="ignore")
'abc'

Details

From the docs, here are more examples using the same errors parameter:

>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\\x80abc'
>>> b'\x80abc'.decode("utf-8", "strict")  
Traceback (most recent call last):
    ...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
  invalid start byte

The errors argument specifies the response when the input string can’t be converted according to the encoding’s rules. Legal values for this argument are 'strict' (raise a UnicodeDecodeError exception), 'replace' (use U+FFFD, REPLACEMENT CHARACTER), or 'ignore' (just leave the character out of the Unicode result).

pylang
  • 40,867
  • 14
  • 129
  • 121
12

UPDATED:

TO NOT HAVE ANY b and quotes at first and end

How to convert bytes as seen to strings, even in weird situations.

As your code may have unrecognizable characters to 'utf-8' encoding, it's better to use just str without any additional parameters:

some_bad_bytes = b'\x02-\xdfI#)'
text = str( some_bad_bytes )[2:-1]

print(text)
Output: \x02-\xdfI

if you add 'utf-8' parameter, to these specific bytes, you should receive error.

As PYTHON 3 standard says, text would be in utf-8 now with no concern.

Community
  • 1
  • 1
Seyfi
  • 1,832
  • 1
  • 20
  • 35
  • result is "b'\\x02-\\xdfI#)'" which probably isn't what he wants – Glen Thompson Oct 18 '17 at 20:49
  • @GlenThompson it is just an example for unwanted conditions, that may happen. I use this specific text intentionally. If you mean text has a `b` in first of it, then I updated answer – Seyfi Oct 19 '17 at 20:46
  • so very thanks i'm searching for a way for remove the b'' of an string that have ansi character without encoding and lossing the characters, i'm new in python and don't know than i can reduce an array from start and beginning using indexes :O – Diego Fernando Murillo Valenci Feb 09 '18 at 20:17
  • @DiegoFernandoMurilloValenci , your welcome. Glad to I can help. – Seyfi Mar 01 '18 at 20:58