How do I get rid of the b-prefix in a string in python?

Question

I have a string with a b-prefix:

b'I posted a new photo to Facebook'

I gather the b indicates it is a byte string.

How do I remove this b prefix? I tried:

b'I posted a new photo to Facebook'.encode("utf-8").decode("utf-8")

But this gives an error:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 64-65: character maps to <undefined>

Possible duplicate of [Suppress/ print without b' prefix for bytes in Python 3](https://stackoverflow.com/questions/16748083/suppress-print-without-b-prefix-for-bytes-in-python-3) — wesinat0r, Dec 11 '18 at 07:11

score 226 · Accepted Answer · edited Apr 24 '22 at 01:30

226

decode the bytes to produce a str:

b = b'1234'
print(b.decode('utf-8'))  # '1234'

edited Apr 24 '22 at 01:30

Mateen Ulhaq

24,552
19
101
135

answered Jan 29 '17 at 08:09

hiro protagonist

44,693
14
86
111

I've updated the question. I don't think this method works. If it does, could you elaborate why? – Stan Shunpike Jan 31 '17 at 00:16
4

`.encode("utf-8").decode("utf-8")` does absolutely nothing (if it works at all)... you are on python 3, right? py3 has a strong distinction between `bytes` and `str`. something in your code seems to use the `cp1252` encoding... you could try to open your file with `open(..., mode='w', encoding='utf-8')` and only write `str` to the file; or you forget about all the encoding and write the file in binary: `open(..., mode='wb')` (note the `b`) and only write `bytes`. does that help? – hiro protagonist Jan 31 '17 at 06:48
No, that doesn't fix it. I got `"b'Due to the storms this weekend, we have rescheduled the Blumenfield Bike Ride for Feb 26. Hope to see you there.\xe2\x80\xa6'"` – Stan Shunpike Jan 31 '17 at 07:03
How can you tell it encodes as cp1252? I also didn't think `.encode("utf-8").decode("utf-8")` would do anything, but the people here seemed to think that was the right answer, which it is not as far as i can see. – Stan Shunpike Jan 31 '17 at 07:03
i spotted this path in you traceback: `C:\Users\Stan Shunpike\Anaconda3\lib\encodings\cp1252.py`. you probably should try to find out how/where that is used. oh, and you are using the `csv.writer`; in that case you need to write `str` indeed an not `bytes`. are you getting things from `requests`? the encoding you get from a web resource may differ from `utf-8`. – hiro protagonist Jan 31 '17 at 08:17

score 27 · Answer 2 · edited Apr 24 '22 at 23:19

The object you are printing is not a string, but rather a bytes object as a byte literal.

Consider creating a byte object by typing a byte literal (literally defining a byte object without actually using a byte object e.g. by typing b'') and converting it into a string object encoded in utf-8. (Note that converting here means decoding)

byte_object= b"test" # byte object by literally typing characters
print(byte_object) # Prints b'test'
print(byte_object.decode('utf8')) # Prints "test" without quotations

We simply applied the .decode(utf8) function.

String literals are described by the following lexical definitions:

https://docs.python.org/3.3/reference/lexical_analysis.html#string-and-bytes-literals

stringliteral   ::=  [stringprefix](shortstring | longstring)
stringprefix    ::=  "r" | "u" | "R" | "U"
shortstring     ::=  "'" shortstringitem* "'" | '"' shortstringitem* '"'
longstring      ::=  "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
shortstringitem ::=  shortstringchar | stringescapeseq
longstringitem  ::=  longstringchar | stringescapeseq
shortstringchar ::=  <any source character except "\" or newline or the quote>
longstringchar  ::=  <any source character except "\">
stringescapeseq ::=  "\" <any source character>

bytesliteral   ::=  bytesprefix(shortbytes | longbytes)
bytesprefix    ::=  "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
shortbytes     ::=  "'" shortbytesitem* "'" | '"' shortbytesitem* '"'
longbytes      ::=  "'''" longbytesitem* "'''" | '"""' longbytesitem* '"""'
shortbytesitem ::=  shortbyteschar | bytesescapeseq
longbytesitem  ::=  longbyteschar | bytesescapeseq
shortbyteschar ::=  <any ASCII character except "\" or newline or the quote>
longbyteschar  ::=  <any ASCII character except "\">
bytesescapeseq ::=  "\" <any ASCII character>

score 8 · Answer 3 · edited Jun 01 '22 at 17:42

8

You need to decode it to convert it to a string. Check the answer here about bytes literal in python3.

b'I posted a new photo to Facebook'.decode('utf-8')
# 'I posted a new photo to Facebook'

edited Jun 01 '22 at 17:42

cottontail

10,268
18
50
51

answered Jan 29 '17 at 08:10

salmanwahed

9,450
7
32
55

1

the problem with this is that, when i try to download tweets without the `encode("utf-8")` I get errors. And, as I mentioned here, http://stackoverflow.com/q/41915383/4422095 removing that didn't solve it. Even if I use the decode as u suggest, I still get an error. I will post that in the post. – Stan Shunpike Jan 29 '17 at 08:26
done. it's not exactly the same because u need twitter OAuth codes to do it. but if u just do the example i gave, u will get the same problem. it is not solved by the method u suggested. it just undoes the utf-8. **but that doesn't work because it won't process the characters in the tweets without utf-8 encoding** – Stan Shunpike Jan 30 '17 at 07:39
You have to use correct encoding of-course. `utf-8` was an example. – salmanwahed Jan 30 '17 at 08:41

score 7 · Answer 4 · edited Jun 01 '22 at 17:41

7

How to remove b' ' chars which is a decoded string in python:

import base64
a='cm9vdA=='
b=base64.b64decode(a).decode('utf-8')
print(b)

edited Jun 01 '22 at 17:41

cottontail

10,268
18
50
51

answered Sep 05 '18 at 07:57

Avinash Chougule

71
1
2

score 3 · Answer 5 · edited Jun 01 '22 at 17:46

On python 3.6 with django 2.0, decode on a byte literal does not work as expected. Yes I get the right result when I print it, but the b'value' is still there even if you print it right.

This is what I'm encoding

uid': urlsafe_base64_encode(force_bytes(user.pk)),

This is what I'm decoding:

uid = force_text(urlsafe_base64_decode(uidb64))

This is what django 2.0 says :

urlsafe_base64_encode(s)[source]

Encodes a bytestring in base64 for use in URLs, stripping any trailing equal signs.

urlsafe_base64_decode(s)[source]

Decodes a base64 encoded string, adding back any trailing equal signs that might have been stripped.

This is my account_activation_email_test.html file

{% autoescape off %}
Hi {{ user.username }},

Please click on the link below to confirm your registration:

http://{{ domain }}{% url 'accounts:activate' uidb64=uid token=token %}
{% endautoescape %}

This is my console response:

Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Activate Your MySite Account From: webmaster@localhost To: testuser@yahoo.com Date: Fri, 20 Apr 2018 06:26:46 -0000 Message-ID: <152420560682.16725.4597194169307598579@Dash-U>

Hi testuser,

Please click on the link below to confirm your registration:
http://127.0.0.1:8000/activate/b'MjU'/4vi-fasdtRf2db2989413ba/

as you can see uid = b'MjU'

expected uid = MjU

test in console:

$ python
Python 3.6.4 (default, Apr  7 2018, 00:45:33) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from django.utils.http import urlsafe_base64_encode, urlsafe_base64_decode
>>> from django.utils.encoding import force_bytes, force_text
>>> var1=urlsafe_base64_encode(force_bytes(3))
>>> print(var1)
b'Mw'
>>> print(var1.decode())
Mw
>>>

After investigating it seems like its related to python 3. My workaround was quite simple:

'uid': user.pk,

I receive it as uidb64 on my activate function:

user = User.objects.get(pk=uidb64)

and voila:

Content-Transfer-Encoding: 7bit
Subject: Activate Your MySite Account
From: webmaster@localhost
To: testuser@yahoo.com
Date: Fri, 20 Apr 2018 20:44:46 -0000
Message-ID: <152425708646.11228.13738465662759110946@Dash-U>


Hi testuser,

Please click on the link below to confirm your registration:

http://127.0.0.1:8000/activate/45/4vi-3895fbb6b74016ad1882/

now it works fine.

I believe that the problem is not the decode but instead is the autoescape off in the template that is unable to strip the byte literal into a string just like decode does. — Fernando D Jaime, Apr 20 '18 at 15:46

score 2 · Answer 6 · edited Jun 01 '22 at 17:42

2

Assuming you don't want to immediately decode it again like others are suggesting here, you can parse it to a string and then just strip the leading 'b and trailing '.

x = "Hi there " 
x = "Hi there ".encode("utf-8") 
x # b"Hi there \xef\xbf\xbd"
str(x)[2:-1]
# "Hi there \\xef\\xbf\\xbd"

edited Jun 01 '22 at 17:42

cottontail

10,268
18
50
51

answered Feb 21 '20 at 03:46

Joseph Boyd

338
2
8

score 1 · Answer 7 · answered Apr 26 '17 at 16:58

I got it done by only encoding the output using utf-8. Here is the code example

new_tweets = api.GetUserTimeline(screen_name = user,count=200)
result = new_tweets[0]
try: text = result.text
except: text = ''

with open(file_name, 'a', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerows(text)

i.e: do not encode when collecting data from api, encode the output (print or write) only.

score 1 · Answer 8 · answered Jun 01 '22 at 11:56

1

Alongside with @hiro protagonist answer, you can convert bytes to string by providing characters set into str:

b = b'1234'
str(b,'utf-8') # '1234'

answered Jun 01 '22 at 11:56

Neinmonarch

21
2

score -2 · Answer 9 · edited Feb 20 '18 at 08:09

-2

Although the question is very old, I think it may be helpful to who is facing the same problem. Here the texts is a string like below:

text= "b'I posted a new photo to Facebook'"

Thus you can not remove b by encoding it because it's not a byte. I did the following to remove it.

cleaned_text = text.split("b'")[1]

which will give "I posted a new photo to Facebook"

edited Feb 20 '18 at 08:09

Dmitriy

5,525
12
25
38

answered Feb 20 '18 at 07:45

Kamol Roy

27
3

4

No, that will give `"I posted a new photo to Facebook'"`. This is not what the question is about, anyway. – tripleee Feb 20 '18 at 08:24

How do I get rid of the b-prefix in a string in python?

9 Answers9

String literals are described by the following lexical definitions:

Linked

Related