2

I have a similar question to this previous question. However, my dictionary has a structure like the following

data_dict = {
  'refresh_count': 1,
  'fetch_date': '10-10-2019',
  'modified_date': '',
  'data': [
      {'date': '10-10-2019', 'title': 'Hello1'}, 
      {'date': '11-10-2019', 'title': 'Hello2'}
  ]
}

I would like to store it in JSON so that my data is still stored in one dictionary per line. Something like:

{
  'refresh_count': 1,
  'fetch_date': '10-10-2019',
  'modified_date': '',
  'data': [
      {'date': '10-10-2019', 'title': 'Hello1'}, 
      {'date': '11-10-2019', 'title': 'Hello2'}
  ]
}

I cannot achieve it using simply using json.dumps (or dump) or the previous solution.

json.dumps(data_dict, indent=2)

>> {
  "refresh_count": 1,
  "fetch_date": "10-10-2019",
  "modified_date": "",
  "data": [
    {
      "date": "10-10-2019",
      "title": "Hello1"
    },
    {
      "date": "11-10-2019",
      "title": "Hello2"
    }
  ]

}

titipata
  • 5,321
  • 3
  • 35
  • 59

1 Answers1

3

This is quite a hack, but you can implement a custom JSON encoder that will do what you want (see Custom JSON Encoder in Python With Precomputed Literal JSON). For any object that you do not want to be indented, wrap it with the NoIndent class. The custom JSON encoder will look for this type in the default() method and return a unique string (__N__) and store unindented JSON in self._literal. Later, in the call to encode(), these unique strings are replaced with the unindented JSON.

Note that you need to choose a string format that cannot possibly appear in the encoded data to avoid replacing something unintentionally.

import json


class NoIndent:

    def __init__(self, o):
        self.o = o


class MyEncoder(json.JSONEncoder):

    def __init__(self, *args, **kwargs):
        super(MyEncoder, self).__init__(*args, **kwargs)
        self._literal = []

    def default(self, o):
        if isinstance(o, NoIndent):
            i = len(self._literal)
            self._literal.append(json.dumps(o.o))
            return '__%d__' % i
        else:
            return super(MyEncoder, self).default(o)

    def encode(self, o):
        s = super(MyEncoder, self).encode(o)
        for i, literal in enumerate(self._literal):
            s = s.replace('"__%d__"' % i, literal)
        return s


data_dict = {
  'refresh_count': 1,
  'fetch_date': '10-10-2019',
  'modified_date': '',
  'data': [
      NoIndent({'date': '10-10-2019', 'title': 'Hello1'}),
      NoIndent({'date': '11-10-2019', 'title': 'Hello2'}),
  ]
}

s = json.dumps(data_dict, indent=2, cls=MyEncoder)
print(s)

Intermediate representation returned by super(MyEncoder, self).encode(o):

{
  "fetch_date": "10-10-2019", 
  "refresh_count": 1, 
  "data": [
    "__0__", 
    "__1__"
  ], 
  "modified_date": ""
}

Final output:

{
  "fetch_date": "10-10-2019", 
  "refresh_count": 1, 
  "data": [
    {"date": "10-10-2019", "title": "Hello1"}, 
    {"date": "11-10-2019", "title": "Hello2"}
  ], 
  "modified_date": ""
}
chash
  • 3,975
  • 13
  • 29