62

I'm writing a script to automate data generation for a demo and I need to serialize in a JSON some data. Part of this data is an image, so I encoded it in base64, but when I try to run my script I get:

Traceback (most recent call last):
  File "lazyAutomationScript.py", line 113, in <module>
    json.dump(out_dict, outfile)
  File "/usr/lib/python3.4/json/__init__.py", line 178, in dump
    for chunk in iterable:
  File "/usr/lib/python3.4/json/encoder.py", line 422, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.4/json/encoder.py", line 396, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.4/json/encoder.py", line 396, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.4/json/encoder.py", line 429, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.4/json/encoder.py", line 173, in default
    raise TypeError(repr(o) + " is not JSON serializable")
  TypeError: b'iVBORw0KGgoAAAANSUhEUgAADWcAABRACAYAAABf7ZytAAAABGdB...
     ...
   BF2jhLaJNmRwAAAAAElFTkSuQmCC' is not JSON serializable

As far as I know, a base64-encoded-whatever (a PNG image, in this case) is just a string, so it should pose to problem to serializating. What am I missing?

frollo
  • 1,296
  • 1
  • 13
  • 29

3 Answers3

135

You must be careful about the datatypes.

If you read a binary image, you get bytes. If you encode these bytes in base64, you get ... bytes again! (see documentation on b64encode)

json can't handle raw bytes, that's why you get the error.

I have just written some example, with comments, I hope it helps:

from base64 import b64encode
from json import dumps

ENCODING = 'utf-8'
IMAGE_NAME = 'spam.jpg'
JSON_NAME = 'output.json'

# first: reading the binary stuff
# note the 'rb' flag
# result: bytes
with open(IMAGE_NAME, 'rb') as open_file:
    byte_content = open_file.read()

# second: base64 encode read data
# result: bytes (again)
base64_bytes = b64encode(byte_content)

# third: decode these bytes to text
# result: string (in utf-8)
base64_string = base64_bytes.decode(ENCODING)

# optional: doing stuff with the data
# result here: some dict
raw_data = {IMAGE_NAME: base64_string}

# now: encoding the data to json
# result: string
json_data = dumps(raw_data, indent=2)

# finally: writing the json string to disk
# note the 'w' flag, no 'b' needed as we deal with text here
with open(JSON_NAME, 'w') as another_open_file:
    another_open_file.write(json_data)
spky
  • 2,123
  • 2
  • 17
  • 21
  • I had a similar problem when I was using Gmail API to send Email with this specific action `return {'raw': base64.urlsafe_b64encode(message.as_string())}`. @spky Thanks for your answer! – InamTaj Sep 28 '16 at 10:52
  • 1
    I am doing the same for excel file, everything is happening correctly but file written to disk is corrupted, could not be opened normally – akash karothiya Jan 18 '17 at 11:34
  • 1
    Thank you so much. This was incredibly confusing to me and this explains it really well. – jStaff Aug 07 '19 at 19:05
  • 1
    @spky In the above we are sending the string image. How to again convert back that string to image – Naren Babu R Oct 13 '20 at 05:49
  • @spky any idea, how to read the image back using OpenCV from this json_data. – Surender Singh Nov 19 '21 at 14:11
11

Alternative solution would be encoding stuff on the fly with a custom encoder:

import json
from base64 import b64encode

class Base64Encoder(json.JSONEncoder):
    # pylint: disable=method-hidden
    def default(self, o):
        if isinstance(o, bytes):
            return b64encode(o).decode()
        return json.JSONEncoder.default(self, o)

Having that defined you can do:

m = {'key': b'\x9c\x13\xff\x00'}
json.dumps(m, cls=Base64Encoder)

It will produce:

'{"key": "nBP/AA=="}'
ssubbotin
  • 536
  • 5
  • 6
9

What am I missing?

The error is yelling that a binary is not JSON serializable.

from base64 import b64encode

# *binary representation* of the base64 string
assert b64encode(b"binary content")                 == b'YmluYXJ5IGNvbnRlbnQ='

# base64 string
assert b64encode(b"binary content").decode('utf-8') ==  'YmluYXJ5IGNvbnRlbnQ='

The latter is definitely "JSON serializable" because is the base64 string representation of the binary b"binary content".

Filippo Vitale
  • 7,597
  • 3
  • 58
  • 64