10

My server is going to be sending a JSON, serialized as a string, through a socket to another client machine. I'll take my final json and do this:

import json
python_dict_obj = { "id" : 1001, "name" : "something", "file" : <???> }
serialized_json_str = json.dumps(python_dict_obj)

I'd like to have one of the fields in my JSON have the value that is a file, encoded as a string.

Performance-wise (but also interoperability-wise) what is the best way to encode a file using python? Base64? Binary? Just the raw string text?

EDIT - For those suggestion base64, something like this?

# get file
import base64
import json

with open(filename, 'r') as f:
    filecontents = f.read()
encoded = base64.b64encode(filecontents)
python_dict_obj['file'] = encoded
serialized_json_str = json.dumps(python_dict_obj)

# ... sent to client via socket

# decrpyting
json_again = json.loads(serialized)
filecontents_again = base64.b64decode(json_again['file'])
lollercoaster
  • 15,969
  • 35
  • 115
  • 173
  • In python 3.5, I needed to do one more encode to get a string in my dict. `python_dict_obj['file'] = encoded.encode()`. Otherwise, the value was a binary `b'something'` which caused an error during `json.dumps`. – Adam Loving Oct 05 '18 at 17:29

2 Answers2

7

I'd use base64. JSON isn't designed to communicate binary data. So unless your file's content is vanilla text, it "should be" encoded to use vanilla text. Virtually everything can encode and decode base64. If you instead use (for example) Python's repr(file_content), that also produces "plain text", but the receiving end would need to know how to decode the string escapes Python's repr() uses.

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • 1
    Well, on the decoding side, you'd get `filecontents` from some spelling of `json.loads()`. After that, you're done with JSON. The `base64` decoder applied to `filecontents` will give you back the original binary file contents. Try it! This is easier to do than to explain ;-) – Tim Peters Oct 03 '13 at 03:07
  • Oops! After your latest edit, I think you figured it out :-) – Tim Peters Oct 03 '13 at 03:08
3

JSON cannot handle binary. You will need to encode the data as text before serializing, and the easiest to encode it as is Base64. You do not need to use the URL-safe form of encoding unless there are requirements for it further down the processing chain.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • Is it also worth base64 encoding the entire json string as well too? – lollercoaster Oct 03 '13 at 03:06
  • Only if something further down the chain requires it. But JSON is plain text regardless (except regarding `ensure_ascii`, but that's a different issue which you'll either already know how to handle or can safely ignore). – Ignacio Vazquez-Abrams Oct 03 '13 at 03:10
  • so I wouldn't gain any size reduction by doing that? (base64 encoding file then base64 encoding surrounding json as well) – lollercoaster Oct 03 '13 at 03:11
  • 2
    No size reduction: `base64` encoding generally **increases** the number of bytes needed. After all, it's only using 6 of each 8 bits per byte (2**6 == 64, the number of distinct possible values in a `base64` encoding). – Tim Peters Oct 03 '13 at 03:13
  • 1
    Encoding as Base64 *increases* the size of the data by 33%. – Ignacio Vazquez-Abrams Oct 03 '13 at 03:13