json encoder different results for json.dump and json.dumps

Question

I had a string in this format,

d = {'details': {'hawk_branch': {'tandem': ['4210bnd72']}, 'uclif_branch': {'tandem': ['e2nc712nma89', '23s24212', '12338cm82']}}}

I wanted to write it to file in this format, converting lists to dictionaries and adding word value as key for each value in the list, so {'tandem': ['4210bnd72']} should become

  "tandem": {
    "value": "4210bnd72"
  }

Here is the expected output file,

{
  "details": {

    "hawk_branch": {
      "tandem": {
        "value": "4210bnd72"
      }
    },
    "uclif_branch": {
      "tandem": {
        "value": "e2nc712nma89",
        "value": "23s24212",
        "value": "12338cm82",
      }
    }
    }
}

I asked a question here where someone answered to use json.JSONEncoder,

class restore_value(json.JSONEncoder):
    def encode(self, o):
        if isinstance(o, dict):
            return '{%s}' % ', '.join(': '.join((json.encoder.py_encode_basestring(k), self.encode(v))) for k, v in o.items())
        if isinstance(o, list):
            return '{%s}' % ', '.join('"value": %s' % self.encode(v) for v in o)
        return super().encode(o)

using above encoder, If the input is,

d = {'details': {'hawk_branch': {'tandem': ['4210bnd72']}, 'uclif_branch': {'tandem': ['e2nc712nma89', '23s24212', '12338cm82']}}}

the output will become,

print(json.dumps(d, cls=restore_value))
{"details": {"hawk_branch": {"tandem": {"value": "4210bnd72"}}, "uclif_branch": {"tandem": {"value": "e2nc712nma89", "value": "23s24212", "value": "12338cm82"}}}}

This is exactly what I wanted, but now I want to write it to a file.

with open("a.json", "w") as f:
    json.dump(d, f, cls=restore_value)

But it doesn't write in the same way as output by json.dumps.

Expected output,

{"details": {"hawk_branch": {"tandem": {"value": "4210bnd72"}}, "uclif_branch": {"tandem": {"value": "e2nc712nma89", "value": "23s24212", "value": "12338cm82"}}}}

Output i am getting,

{"details": {"hawk_branch": {"tandem": ["4210bnd72"]}, "uclif_branch": {"tandem": ["e2nc712nma89", "23s24212", "12338cm82"]}}}

Can someone please tell me why its writing to a file differently even though I am using the encoder?

Reproducing,

Copy and run this using python 3,

import json


class restore_value(json.JSONEncoder):
    def encode(self, o):
        if isinstance(o, dict):
            return '{%s}' % ', '.join(': '.join((json.encoder.py_encode_basestring(k), self.encode(v))) for k, v in o.items())
        if isinstance(o, list):
            return '{%s}' % ', '.join('"value": %s' % self.encode(v) for v in o)
        return super().encode(o)

d = {'details': {'hawk_branch': {'tandem': ['4210bnd72']}, 'uclif_branch': {'tandem': ['e2nc712nma89', '23s24212', '12338cm82']}}}
print(json.dumps(d, cls=restore_value))


with open("a.json", "w") as f:
  json.dump(d, f, cls=restore_value)

score 6 · Accepted Answer · answered Oct 23 '18 at 00:05

The reason is here:

If you look into the source code of json.__init__.py in CPython/Lib/json here in github: https://github.com/python/cpython/blob/master/Lib/json/init.py

You'll find that json.dump actually use:

if (not skipkeys and ensure_ascii and
    check_circular and allow_nan and
    cls is None and indent is None and separators is None and
    default is None and not sort_keys and not kw):
    iterable = _default_encoder.iterencode(obj)
else:
    if cls is None:
        cls = JSONEncoder
    iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
        check_circular=check_circular, allow_nan=allow_nan, indent=indent,
        separators=separators,
        default=default, sort_keys=sort_keys, **kw).iterencode(obj)
# could accelerate with writelines in some versions of Python, at
# a debuggability cost
for chunk in iterable:
    fp.write(chunk)

Hence the function you would want to override should be json.JSONEncoder.iterencode instead of encode.

thank you, this works but i lose all the indent. Is there a way to preserve indentation? — MaverickD, Oct 23 '18 at 00:52
What do you mean by `indent`? By default json files are not indented and should be a single line. — Rocky Li, Oct 23 '18 at 00:55
This worked but I had a small issue with an unknown parameter (`_one_shot`) being passed to `iterencode`. The solution was to add `**kwargs` as a parameter to my `iterencode` def. — JimmyJames, Nov 22 '20 at 16:51

score 3 · Answer 2 · answered Oct 23 '18 at 00:19

3

json.dumps with cls will call the encode method on your JSON object, which will return the string representation. json.dump, on the other hand, will call the default method which you have not implemented. From the json.dump docs:

To use a custom JSONEncoder subclass (e.g. one that overrides the default() method to serialize additional types), specify it with the cls kwarg; otherwise JSONEncoder is used.

Therefore, json.dump is using the default default method which doesn't affect your original object and writes that.

The simplest way to write you file the way you want it is

with open("a.json", "w") as f:
    f.write(json.dumps(d, cls=restore_value))

answered Oct 23 '18 at 00:19

Tyler Zeller

254
2
8

1

thank you. this make sense. is this possible to use `indent` as well while doing this way? i am losing all the indent – MaverickD Oct 23 '18 at 00:51
The only downside to this is that if OP's dict is very large this might not work due to how `iterencode` works as a generator and is able to divide itself into chunks to avoid memory problems. – Rocky Li Oct 23 '18 at 00:54

json encoder different results for json.dump and json.dumps

2 Answers2

Linked