Line-length based custom python JSON encoding for serializables

Question

My problem is similar to Can I implement custom indentation for pretty-printing in Python’s JSON module? and How to change json encoding behaviour for serializable python object? but instead I'd like to collapse lines together if the entire JSON encoded structure can fit on that single line, with configurable line length, in Python 2.X and 3.X. The output is intended for easy-to-read documentation of the JSON structures, rather than debugging. Clarifying: the result MUST be valid JSON, and allow for the regular JSON encoding features of OrderedDicts/sort_keys, default handlers, and so forth.

The solution from custom indentation does not apply, as the individual structures would need to know their serialized lengths in advance, thus adding a NoIndent class doesn't help as every structure might or might not be indented. The solution from changing the behavior of json serializable does not apply as there aren't any (weird) custom overrides on the data structures, they're just regular lists and dicts.

For example, instead of:

{
  "@context": "http://linked.art/ns/context/1/full.jsonld", 
  "id": "http://lod.example.org/museum/ManMadeObject/0", 
  "type": "ManMadeObject", 
  "classified_as": [
    "aat:300033618", 
    "aat:300133025"
  ]
}

I would like to produce:

{
  "@context": "http://linked.art/ns/context/1/full.jsonld", 
  "id": "http://lod.example.org/museum/ManMadeObject/0", 
  "type": "ManMadeObject", 
  "classified_as": ["aat:300033618", "aat:300133025"]
}

At any level of nesting within the structure, and across any numbers of levels of nesting until the line length was reached. Thus if there was a list with a single object inside, with a single key/value pair, it would become:

{
  "@context": "http://linked.art/ns/context/1/full.jsonld", 
  "id": "http://lod.example.org/museum/ManMadeObject/0", 
  "type": "ManMadeObject", 
  "classified_as": [{"id": "aat:300033618"}]
}

It seems like a recursive descent parser on the indented output would work, along the lines of @robm's approach to custom indentation, but the complexity seems to quickly approach that of writing a JSON parser and serializer. Otherwise it seems like a very custom JSONEncoder is needed.

Your thoughts appreciated!

did you used [`pprint`](https://docs.python.org/2/library/pprint.html) ? it's for python2 and 3 — Chiheb Nexus, Jun 29 '17 at 20:45
@ChihebNexus Yeah but it doesn't work as expected for dicts with multiple keys. — cs95, Jun 29 '17 at 20:53
pprint does not work, as the result is not necessarily valid JSON. — Rob Sanderson, Jun 29 '17 at 20:59
@Coldspeed, for the default behaviour of `pprint` i think it do a good job. Look here [https://repl.it/JHzW/0](https://repl.it/JHzW/0) — Chiheb Nexus, Jun 29 '17 at 21:00
@RobSanderson That can actually be remedied, if you are willing to accept `pprint`'s output. — cs95, Jun 29 '17 at 21:01
Clarified in the question. The ordering of keys is important for documentation (customized, using OrderedDict to ensure @context, then id, then type). Also some keys are very long, so pprint's typical indentation would look very very strange. If there's a solution using pprint, that's cool, but it should produce /exactly/ as above. — Rob Sanderson, Jun 29 '17 at 21:11

score 0 · Answer 1 · answered Jun 29 '17 at 22:08

Very inefficient, but seems to work so far:

def _collapse_json(text, collapse):
    js_indent = 2
    lines = text.splitlines()
    out = [lines[0]]
    while lines:
        l = lines.pop(0)
        indent = len(re.split('\S', l, 1)[0])
        if indent and l.rstrip()[-1] in ['[', '{']:
            curr = indent
            temp = []
            stemp = []
            while lines and curr <= indent:
                if temp and curr == indent:
                    break
                temp.append(l[curr:])
                stemp.append(l.strip())
                l = lines.pop(0)
                indent = len(re.split('\S', l, 1)[0])                   
            temp.append(l[curr:])
            stemp.append(l.lstrip())

            short = " " * curr + ''.join(stemp)
            if len(short) < collapse:
                out.append(short)
            else:
                ntext = '\n'.join(temp)
                nout = _collapse_json(ntext, collapse)                  
                for no in nout:
                    out.append(" " * curr + no)
                l = lines.pop(0)
        elif indent:
            out.append(l)
    out.append(l)
    return out

def collapse_json(text, collapse):
    return '\n'.join(_collapse_json(text, collapse))

Happy to accept something else that produces the same output without crawling up and down constantly!

Line-length based custom python JSON encoding for serializables

1 Answers1