19

I was wondering if it is possible to convert a byte string which I got from reading a file to a string (so type(output) == str). All I've found on Google so far has been answers like How do you base-64 encode a PNG image for use in a data-uri in a CSS file?, which does seem like it would work in python 2 (where, if I'm not mistaken, strings were byte strings anyway), but which doesn't work in python 3.4 anymore.

The reason I want to convert this resulting byte string to a normal string is that I want to use this base64-encoded data to store in a JSON object, but I keep getting an error similar to:

TypeError: b'Zm9v' is not JSON serializable

Here's a minimal example of where it goes wrong:

import base64
import json
data = b'foo'
myObj = [base64.b64encode(data)]
json_str = json.dumps(myObj)

So my question is: is there a way to convert this object of type bytes to an object of type str while still keeping the base64-encoding (so in this example, I want the result to be ["Zm9v"]. Is this possible?

Community
  • 1
  • 1
Joeytje50
  • 18,636
  • 15
  • 63
  • 95

4 Answers4

14

Try

data = b'foo'.decode('UTF-8')

instead of

data = b'foo'

to convert it into a string.

13

What works for me is to change the b64encode line to:

myObj = [base64.b64encode(data).decode('ascii')]

This is explained in https://stackoverflow.com/a/42776711 :

base64 has been intentionally classified as a binary transform.... It was a design decision in Python 3 to force the separation of bytes and text and prohibit implicit transformations.

The accepted answer doesn't work for me (Python 3.9) and gives the error:

Traceback (most recent call last):
  File "/tmp/x.py", line 4, in <module>
    myObj = [base64.b64encode(data)]
  File "/usr/lib64/python3.9/base64.py", line 58, in b64encode
    encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'
jmou
  • 465
  • 4
  • 7
  • 1
    I just tried the accepted answer in python 3.6 and it does appear to still work there. Any idea why there would be breaking changes between these two minor versions? Minor version updates shouldn't have breaking changes, so I'm curious to see. Could you test the accepted answer in python3.6 as well, to see if you can reproduce the error there? – Joeytje50 Nov 23 '20 at 22:42
  • Python 3.6 doesn't work for me either: ``` $ docker run --rm -it python:3.6 Python 3.6.12 (default, Nov 18 2020, 14:46:32) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import base64 >>> import json >>> data = b'foo'.decode('UTF-8') >>> myObj = [base64.b64encode(data)] Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.6/base64.py", line 58, in b64encode encoded = binascii.b2a_base64(s, newline=False) TypeError: a bytes-like object is required, not 'str' ``` – jmou Nov 24 '20 at 23:22
  • Sorry for the poor formatting! That's just me running the commands in a Python 3.6 Docker container. – jmou Nov 24 '20 at 23:25
4

Try this:

def bytes_to_base64_string(value: bytes) -> str:
   import base64
   return base64.b64encode(value).decode('ASCII')

There is one misunderstanding often made, especially by people coming from the Java world. The bytes.decode('ASCII') actually encodes bytes to string, not decodes them.

Eugene Gr. Philippov
  • 1,908
  • 2
  • 23
  • 18
2

I couldn't find a decent answer which worked on converting bytes to urlsafe b64 encoded string, so posting my solution here.

Let's say you have an input:

mystring = b'\xab\x8c\xd3\x1fw\xbb\xaaz\xef\x0e\xcb|\xf0\xc3\xdfx=\x16\xeew7\xffU\ri/#\xcf0\x8a2\xa0'

Encode to base64

from base64 import b64encode # or urlsafe_b64decode
b64_mystring = b64encode(mystring) 

this gives: b'q4zTH3e7qnrvDst88MPfeD0W7nc3/1UNaS8jzzCKMqA=' which still needs decoding, since bytes are not JSON serializable.

import requests
requests.get("https://google.com", json={"this": b64_mystring})

# raises "TypeError: Object of type bytes is not JSON serializable"

Hence we use:

from base64 import b64encode
b64_mystring = b64encode(mystring).decode("utf-8")

This gives us: q4zTH3e7qnrvDst88MPfeD0W7nc3/1UNaS8jzzCKMqA=

which is now JSON serializable (using json.dumps).

Yash Nag
  • 1,096
  • 12
  • 16