67

So I'm using Python 2.7, using the json module to encode the following data structure:

'layer1': {
    'layer2': {
        'layer3_1': [ long_list_of_stuff ],
        'layer3_2': 'string'
    }
}

My problem is that I'm printing everything out using pretty printing, as follows:

json.dumps(data_structure, indent=2)

Which is great, except I want to indent it all, except for the content in "layer3_1" — It's a massive dictionary listing coordinates, and as such, having a single value set on each one makes pretty printing create a file with thousands of lines, with an example as follows:

{
  "layer1": {
    "layer2": {
      "layer3_1": [
        {
          "x": 1,
          "y": 7
        },
        {
          "x": 0,
          "y": 4
        },
        {
          "x": 5,
          "y": 3
        },
        {
          "x": 6,
          "y": 9
        }
      ],
      "layer3_2": "string"
    }
  }
}

What I really want is something similar to the following:

{
  "layer1": {
    "layer2": {
      "layer3_1": [{"x":1,"y":7},{"x":0,"y":4},{"x":5,"y":3},{"x":6,"y":9}],
      "layer3_2": "string"
    }
  }
}

I hear it's possible to extend the json module: Is it possible to set it to only turn off indenting when inside the "layer3_1" object? If so, would somebody please tell me how?

Wooble
  • 87,717
  • 12
  • 108
  • 131
Rohaq
  • 1,886
  • 1
  • 15
  • 22
  • 7
    Your first code snippet is neither JSON nor Python. –  Nov 06 '12 at 10:51
  • Indentation is a matter of printing, not of representation. – Yuval Adam Nov 06 '12 at 10:53
  • For "pretty printing" you mean you're using the `pprint` module? – Bakuriu Nov 06 '12 at 10:55
  • Amended the first snippet to something recognisable. And I'm using `json.dumps(data_structure, indent=2)` - Added that as an example. – Rohaq Nov 06 '12 at 10:57
  • I've posted a solution that works on 2.7 and plays nicely with options such as `sort_keys` and does not have special case implementation for sort order and instead relies on (composition with) `collections.OrderedDict`. – Erik Kaplun Sep 19 '14 at 14:21

12 Answers12

28

(Note: The code in this answer only works with json.dumps() which returns a JSON formatted string, but not with json.dump() which writes directly to file-like objects. There's a modified version of it that works with both in my answer to the question Write two-dimensional list to JSON file.)

Updated

Below is a version of my original answer that has been revised several times. Unlike the original, which I posted only to show how to get the first idea in J.F.Sebastian's answer to work, and which like his, returned a non-indented string representation of the object. The latest updated version returns the Python object JSON formatted in isolation.

The keys of each coordinate dict will appear in sorted order, as per one of the OP's comments, but only if a sort_keys=True keyword argument is specified in the initial json.dumps() call driving the process, and it no longer changes the object's type to a string along the way. In other words, the actual type of the "wrapped" object is now maintained.

I think not understanding the original intent of my post resulted in number of folks downvoting it—so, primarily for that reason, I have "fixed" and improved my answer several times. The current version is a hybrid of my original answer coupled with some of the ideas @Erik Allik used in his answer, plus useful feedback from other users shown in the comments below this answer.

The following code appears to work unchanged in both Python 2.7.16 and 3.7.4.

from _ctypes import PyObj_FromPtr
import json
import re

class NoIndent(object):
    """ Value wrapper. """
    def __init__(self, value):
        self.value = value


class MyEncoder(json.JSONEncoder):
    FORMAT_SPEC = '@@{}@@'
    regex = re.compile(FORMAT_SPEC.format(r'(\d+)'))

    def __init__(self, **kwargs):
        # Save copy of any keyword argument values needed for use here.
        self.__sort_keys = kwargs.get('sort_keys', None)
        super(MyEncoder, self).__init__(**kwargs)

    def default(self, obj):
        return (self.FORMAT_SPEC.format(id(obj)) if isinstance(obj, NoIndent)
                else super(MyEncoder, self).default(obj))

    def encode(self, obj):
        format_spec = self.FORMAT_SPEC  # Local var to expedite access.
        json_repr = super(MyEncoder, self).encode(obj)  # Default JSON.

        # Replace any marked-up object ids in the JSON repr with the
        # value returned from the json.dumps() of the corresponding
        # wrapped Python object.
        for match in self.regex.finditer(json_repr):
            # see https://stackoverflow.com/a/15012814/355230
            id = int(match.group(1))
            no_indent = PyObj_FromPtr(id)
            json_obj_repr = json.dumps(no_indent.value, sort_keys=self.__sort_keys)

            # Replace the matched id string with json formatted representation
            # of the corresponding Python object.
            json_repr = json_repr.replace(
                            '"{}"'.format(format_spec.format(id)), json_obj_repr)

        return json_repr


if __name__ == '__main__':
    from string import ascii_lowercase as letters

    data_structure = {
        'layer1': {
            'layer2': {
                'layer3_1': NoIndent([{"x":1,"y":7}, {"x":0,"y":4}, {"x":5,"y":3},
                                      {"x":6,"y":9},
                                      {k: v for v, k in enumerate(letters)}]),
                'layer3_2': 'string',
                'layer3_3': NoIndent([{"x":2,"y":8,"z":3}, {"x":1,"y":5,"z":4},
                                      {"x":6,"y":9,"z":8}]),
                'layer3_4': NoIndent(list(range(20))),
            }
        }
    }

    print(json.dumps(data_structure, cls=MyEncoder, sort_keys=True, indent=2))

Output:

{
  "layer1": {
    "layer2": {
      "layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}, {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4, "f": 5, "g": 6, "h": 7, "i": 8, "j": 9, "k": 10, "l": 11, "m": 12, "n": 13, "o": 14, "p": 15, "q": 16, "r": 17, "s": 18, "t": 19, "u": 20, "v": 21, "w": 22, "x": 23, "y": 24, "z": 25}],
      "layer3_2": "string",
      "layer3_3": [{"x": 2, "y": 8, "z": 3}, {"x": 1, "y": 5, "z": 4}, {"x": 6, "y": 9, "z": 8}],
      "layer3_4": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
    }
  }
}
martineau
  • 119,623
  • 25
  • 170
  • 301
  • Nice, I got this working, but wanted to sort the x and y for vanity's sake (parts of the JSON produced needs to be hand edited later on, don't ask why :(), so I tried using an `OrderedDict`. Now my problem is that I get the following in my output: `"layer3_1": "[OrderedDict([('x', 804), ('y', 622)]), OrderedDict([('x', 817), ('y', 635)]), OrderedDict([('x', 817), ('y', 664)]), OrderedDict([('x', 777), (' y', 664)]), OrderedDict([('x', 777), ('y', 622)]), OrderedDict([('x', 804), ('y' , 622)])]",` I think I'm missing something... – Rohaq Nov 06 '12 at 17:21
  • @Rohaq: Realize that you can make the `isinstance(obj, NoIndent)` case do almost whatever you want. Specifically, return a string formatted the way you would like from `obj`. One relatively easy way to implement something like that would be to add a custom `__repr__()` method to the NoIndent class. – martineau Nov 06 '12 at 20:03
  • 7
    This still prints the list as a string instead. – Erik Kaplun Sep 19 '14 at 13:20
  • 1
    @ErikAllik was exactly right. The list became a string: `"[{'x':1, 'y':7}, {'x':0, 'y':4}, {'x':5, 'y':3}, {'x':6, 'y':9}]"`. This is a wrong answer! – AnnieFromTaiwan Feb 16 '16 at 02:16
  • Any idea how to achieve this with Javascript / NodeJS? – chamberlainpi Jun 07 '16 at 05:11
  • 1
    Not working with deserialisation (json.loads()) due to using single quote. I have to use @ErikAllik 's answer instead. -- https://github.com/patarapolw/pyexcel-formatter/blob/master/pyexcel_formatter/serialize.py#L31 – Polv Jul 14 '18 at 09:07
  • 1
    @Polv: Thanks for the feedback. I've updated my answer to address the issue. – martineau Jul 15 '18 at 17:28
  • I tried using it for very long JSON files but it is not scalable, it calls `replace` too many times, with about 30k lines in the JSON it already takes minutes to dump. Any idea on how to improve it? – Gustavo Nov 28 '19 at 13:13
  • @GustavoMaia: 30K lines doesn't seem like that many, so I suspect the bottleneck is somewhere else. – martineau Nov 28 '19 at 15:19
  • @martineau Executing it with a profiler showed exactly that the replace method is called way too many times and slows down the execution when I execute it with that many lins – Gustavo Nov 29 '19 at 16:08
  • @GustavoMaia: Well, it has to be called for everything you want custom formatted — no way around that. – martineau Nov 29 '19 at 16:46
  • Excellent answer. This should be the default Encoder of json package. Have you considered making a pull request to the python library? – Jordan He Sep 23 '20 at 15:20
  • I do get two questions though. 1) I tested that this works for json.dumps, json.loads and json.load, but not json.dump to file. I have to do f.write(json.dumps(...)). How can you make it work for json.dump? 2) Is it absolutely necessary to use a NoIndent mask. Can we specify a 'depth' in the encoder and let it decide what to NoIndent itself? For instance, noIndentDepth=1 means the last layer is always NoIndent, so that in your example 'layer3_1' should split into 5 lines and 'layer3_4' keeps a single line. This seems logical and programingly possible. – Jordan He Sep 23 '20 at 15:30
  • I have encountered the issue with it not working dumping to a file before and, figured-out how to fix that and posted it in one of my other anwsers on this site. I don't recall where at the moment. It had something to do with how the `json` module is written. I don't believe there's a way to do it without defining your own class like `NoIndent` — again because of the way the `Json` module is implemented, the default way it handles many of the Python native times ([shown here](https://docs.python.org/3/library/json.html#json.JSONEncoder)) is hardcoded into it, – martineau Sep 23 '20 at 15:51
  • @JordanHe: You probably don't care anymore, but I found that [answer](https://stackoverflow.com/a/42721412/355230) of mine I was referring to where I fixed the code in this answer to would work with both `json.dumps()` and `json.dump()` (and explain what the problem was). – martineau Apr 19 '21 at 02:16
16

A bodge, but once you have the string from dumps(), you can perform a regular expression substitution on it, if you're sure of the format of its contents. Something along the lines of:

s = json.dumps(data_structure, indent=2)
s = re.sub('\s*{\s*"(.)": (\d+),\s*"(.)": (\d+)\s*}(,?)\s*', r'{"\1":\2,"\3":\4}\5', s)
M Somerville
  • 4,499
  • 30
  • 38
  • Thanks, this worked too, and is indeed smaller, but decided to go with the solution provided by @martineau – Rohaq Nov 07 '12 at 16:37
  • Your solution is very funny!:) I love it, and it doesn't require any "NoIdent" tagging, works out of the box. I'll probably test it for large input files tomorrow, I'm looking for a simple solution to break out of the csv world since it doesn't really allow for metadata, yet keep the readability. – Barney Szabolcs Dec 23 '21 at 22:20
  • hey, amazing answer ! I build on your ideas by provinding a more generic solution with the following regex: re.sub((?:\n\s{8,}(.*))|(?:\n\s{6,}(]|})), r'\1\2', s) Or read it in https://regex101.com/r/xWT7I1/2 – Marc Moreaux May 05 '23 at 14:53
13

The following solution seems to work correctly on Python 2.7.x. It uses a workaround taken from Custom JSON encoder in Python 2.7 to insert plain JavaScript code to avoid custom-encoded objects ending up as JSON strings in the output by using a UUID-based replacement scheme.

class NoIndent(object):
    def __init__(self, value):
        self.value = value


class NoIndentEncoder(json.JSONEncoder):
    def __init__(self, *args, **kwargs):
        super(NoIndentEncoder, self).__init__(*args, **kwargs)
        self.kwargs = dict(kwargs)
        del self.kwargs['indent']
        self._replacement_map = {}

    def default(self, o):
        if isinstance(o, NoIndent):
            key = uuid.uuid4().hex
            self._replacement_map[key] = json.dumps(o.value, **self.kwargs)
            return "@@%s@@" % (key,)
        else:
            return super(NoIndentEncoder, self).default(o)

    def encode(self, o):
        result = super(NoIndentEncoder, self).encode(o)
        for k, v in self._replacement_map.iteritems():
            result = result.replace('"@@%s@@"' % (k,), v)
        return result

Then this

obj = {
  "layer1": {
    "layer2": {
      "layer3_2": "string", 
      "layer3_1": NoIndent([{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}])
    }
  }
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)

produces the follwing output:

{
  "layer1": {
    "layer2": {
      "layer3_2": "string", 
      "layer3_1": [{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}]
    }
  }
}

It also correctly passes all options (except indent) e.g. sort_keys=True down to the nested json.dumps call.

obj = {
    "layer1": {
        "layer2": {
            "layer3_1": NoIndent([{"y": 7, "x": 1, }, {"y": 4, "x": 0}, {"y": 3, "x": 5, }, {"y": 9, "x": 6}]),
            "layer3_2": "string",
        }
    }
}    
print json.dumps(obj, indent=2, sort_keys=True, cls=NoIndentEncoder)

correctly outputs:

{
  "layer1": {
    "layer2": {
      "layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}], 
      "layer3_2": "string"
    }
  }
}

It can also be combined with e.g. collections.OrderedDict:

obj = {
    "layer1": {
        "layer2": {
            "layer3_2": "string",
            "layer3_3": NoIndent(OrderedDict([("b", 1), ("a", 2)]))
        }
    }
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)

outputs:

{
  "layer1": {
    "layer2": {
      "layer3_3": {"b": 1, "a": 2}, 
      "layer3_2": "string"
    }
  }
}

UPDATE: In Python 3, there is no iteritems. You can replace encode with this:

def encode(self, o):
    result = super(NoIndentEncoder, self).encode(o)
    for k, v in iter(self._replacement_map.items()):
        result = result.replace('"@@%s@@"' % (k,), v)
    return result
Ehsan
  • 12,072
  • 2
  • 20
  • 33
Erik Kaplun
  • 37,128
  • 15
  • 99
  • 111
  • 4
    For those who don't understand how this solution works: The two lines `for k, v in self._replacement_map.iteritems(): result = result.replace('"@@%s@@"' % (k,), v)` inside `encode()`, is to replace `"layer3_1": "@@d4e06719f9cb420a82ace98becab5ff8@@"` to `"layer3_1": [{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}]`. I think this solution in some sense equals to @M Somerville's re substitution solution. – AnnieFromTaiwan Feb 16 '16 at 02:42
  • 5
    This works in Python 3 as well. The only caveat is that you **must** use json.dumps, not json.dump! In the latter case you would have to override iterencode() as well and I couldn't get that working. – letmaik Jun 10 '16 at 13:43
9

This yields the OP's expected result:

import json

class MyJSONEncoder(json.JSONEncoder):

  def iterencode(self, o, _one_shot=False):
    list_lvl = 0
    for s in super(MyJSONEncoder, self).iterencode(o, _one_shot=_one_shot):
      if s.startswith('['):
        list_lvl += 1
        s = s.replace('\n', '').rstrip()
      elif 0 < list_lvl:
        s = s.replace('\n', '').rstrip()
        if s and s[-1] == ',':
          s = s[:-1] + self.item_separator
        elif s and s[-1] == ':':
          s = s[:-1] + self.key_separator
      if s.endswith(']'):
        list_lvl -= 1
      yield s

o = {
  "layer1":{
    "layer2":{
      "layer3_1":[{"y":7,"x":1},{"y":4,"x":0},{"y":3,"x":5},{"y":9,"x":6}],
      "layer3_2":"string",
      "layer3_3":["aaa\nbbb","ccc\nddd",{"aaa\nbbb":"ccc\nddd"}],
      "layer3_4":"aaa\nbbb",
    }
  }
}

jsonstr = json.dumps(o, indent=2, separators=(',', ':'), sort_keys=True,
    cls=MyJSONEncoder)
print(jsonstr)
o2 = json.loads(jsonstr)
print('identical objects: {}'.format((o == o2)))
SzieberthAdam
  • 3,999
  • 2
  • 23
  • 31
3

You could try:

  • mark lists that shouldn't be indented by replacing them with NoIndentList:

    class NoIndentList(list):
        pass
    
  • override json.Encoder.default method to produce a non-indented string representation for NoIndentList.

    You could just cast it back to list and call json.dumps() without indent to get a single line

It seems the above approach doesn't work for the json module:

import json
import sys

class NoIndent(object):
    def __init__(self, value):
        self.value = value

def default(o, encoder=json.JSONEncoder()):
    if isinstance(o, NoIndent):
        return json.dumps(o.value)
    return encoder.default(o)

L = [dict(x=x, y=y) for x in range(1) for y in range(2)]
obj = [NoIndent(L), L]
json.dump(obj, sys.stdout, default=default, indent=4)

It produces invalid output (the list is serialized as a string):

[
    "[{\"y\": 0, \"x\": 0}, {\"y\": 1, \"x\": 0}]", 
    [
        {
            "y": 0, 
            "x": 0
        }, 
        {
            "y": 1, 
            "x": 0
        }
    ]
]

If you can use yaml then the method works:

import sys
import yaml

class NoIndentList(list):
    pass

def noindent_list_presenter(dumper, data):
    return dumper.represent_sequence(u'tag:yaml.org,2002:seq', data,
                                     flow_style=True)
yaml.add_representer(NoIndentList, noindent_list_presenter)


obj = [
    [dict(x=x, y=y) for x in range(2) for y in range(1)],
    [dict(x=x, y=y) for x in range(1) for y in range(2)],
    ]
obj[0] = NoIndentList(obj[0])
yaml.dump(obj, stream=sys.stdout, indent=4)

It produces:

- [{x: 0, y: 0}, {x: 1, y: 0}]
-   - {x: 0, y: 0}
    - {x: 0, y: 1}

i.e., the first list is serialized using [] and all items are on one line, the second list uses one line per item.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • 1
    I think I get half of what you're saying, though I am a little confused. Probably down to me not having to override methods in Python before though. I'll do a bit more reading, but if you could provide a more complete example, it would be appreciated! – Rohaq Nov 06 '12 at 11:36
3

Here's a post-processing solution if you have too many different types of objects contributing to the JSON to attempt the JSONEncoder method and too many varying types to use a regex. This function collapses whitespace after a specified level, without needing to know the specifics of the data itself.

def collapse_json(text, indent=12):
    """Compacts a string of json data by collapsing whitespace after the
    specified indent level

    NOTE: will not produce correct results when indent level is not a multiple
    of the json indent level
    """
    initial = " " * indent
    out = []  # final json output
    sublevel = []  # accumulation list for sublevel entries
    pending = None  # holder for consecutive entries at exact indent level
    for line in text.splitlines():
        if line.startswith(initial):
            if line[indent] == " ":
                # found a line indented further than the indent level, so add
                # it to the sublevel list
                if pending:
                    # the first item in the sublevel will be the pending item
                    # that was the previous line in the json
                    sublevel.append(pending)
                    pending = None
                item = line.strip()
                sublevel.append(item)
                if item.endswith(","):
                    sublevel.append(" ")
            elif sublevel:
                # found a line at the exact indent level *and* we have sublevel
                # items. This means the sublevel items have come to an end
                sublevel.append(line.strip())
                out.append("".join(sublevel))
                sublevel = []
            else:
                # found a line at the exact indent level but no items indented
                # further, so possibly start a new sub-level
                if pending:
                    # if there is already a pending item, it means that
                    # consecutive entries in the json had the exact same
                    # indentation and that last pending item was not the start
                    # of a new sublevel.
                    out.append(pending)
                pending = line.rstrip()
        else:
            if pending:
                # it's possible that an item will be pending but not added to
                # the output yet, so make sure it's not forgotten.
                out.append(pending)
                pending = None
            if sublevel:
                out.append("".join(sublevel))
            out.append(line)
    return "\n".join(out)

For example, using this structure as input to json.dumps with an indent level of 4:

text = json.dumps({"zero": ["first", {"second": 2, "third": 3, "fourth": 4, "items": [[1,2,3,4], [5,6,7,8], 9, 10, [11, [12, [13, [14, 15]]]]]}]}, indent=4)

here's the output of the function at various indent levels:

>>> print collapse_json(text, indent=0)
{"zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]}
>>> print collapse_json(text, indent=4)
{
    "zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]
}
>>> print collapse_json(text, indent=8)
{
    "zero": [
        "first",
        {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}
    ]
}
>>> print collapse_json(text, indent=12)
{
    "zero": [
        "first", 
        {
            "items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]],
            "second": 2,
            "fourth": 4,
            "third": 3
        }
    ]
}
>>> print collapse_json(text, indent=16)
{
    "zero": [
        "first", 
        {
            "items": [
                [1, 2, 3, 4],
                [5, 6, 7, 8],
                9,
                10,
                [11, [12, [13, [14, 15]]]]
            ], 
            "second": 2, 
            "fourth": 4, 
            "third": 3
        }
    ]
}
robm
  • 341
  • 2
  • 4
3

Answer for me and Python 3 users

import re

def jsonIndentLimit(jsonString, indent, limit):
    regexPattern = re.compile(f'\n({indent}){{{limit}}}(({indent})+|(?=(}}|])))')
    return regexPattern.sub('', jsonString)

if __name__ == '__main__':
    jsonString = '''{
  "layer1": {
    "layer2": {
      "layer3_1": [
        {
          "x": 1,
          "y": 7
        },
        {
          "x": 0,
          "y": 4
        },
        {
          "x": 5,
          "y": 3
        },
        {
          "x": 6,
          "y": 9
        }
      ],
      "layer3_2": "string"
    }
  }
}'''
    print(jsonIndentLimit(jsonString, '  ', 3))

'''print
{
  "layer1": {
    "layer2": {
      "layer3_1": [{"x": 1,"y": 7},{"x": 0,"y": 4},{"x": 5,"y": 3},{"x": 6,"y": 9}],
      "layer3_2": "string"
    }
  }
}'''
  • This could be the accepted answer. To pretty-print a dictionary, combine it with json.dumps and it looks like this: `jsonString = json.dumps(thedict, indent=4); print(jsonIndentLimit(jsonString, ' ', 3))` – Jordan He Oct 19 '22 at 03:40
1

Best performance code (10MB text costs 1s):

import json
def dumps_json(data, indent=2, depth=2):
    assert depth > 0
    space = ' '*indent
    s = json.dumps(data, indent=indent)
    lines = s.splitlines()
    N = len(lines)
    # determine which lines to be shortened
    is_over_depth_line = lambda i: i in range(N) and lines[i].startswith(space*(depth+1))
    is_open_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i+1)
    is_close_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i-1)
    # 
    def shorten_line(line_index):
        if not is_open_bracket_line(line_index):
            return lines[line_index]
        # shorten over-depth lines
        start = line_index
        end = start
        while not is_close_bracket_line(end):
            end += 1
        has_trailing_comma = lines[end][-1] == ','
        _lines = [lines[start][-1], *lines[start+1:end], lines[end].replace(',','')]
        d = json.dumps(json.loads(' '.join(_lines)))
        return lines[line_index][:-1] + d + (',' if has_trailing_comma else '')
    # 
    s = '\n'.join([
        shorten_line(i)
        for i in range(N) if not is_over_depth_line(i) and not is_close_bracket_line(i)
    ])
    #
    return s

UPDATE: Here's my explanation:

First we use json.dumps to get json string has been indented. Example:

>>>  print(json.dumps({'0':{'1a':{'2a':None,'2b':None},'1b':{'2':None}}}, indent=2))
[0]  {
[1]    "0": {
[2]      "1a": {
[3]        "2a": null,
[4]        "2b": null
[5]      },
[6]      "1b": {
[7]        "2": null
[8]      }
[9]    }
[10] }

If we set indent=2 and depth = 2, then too depth lines start with 6 white-spaces

We has 4 types of line:

  1. Normal line
  2. Open bracket line (2,6)
  3. Exceed depth line (3,4,7)
  4. Close bracket line (5,8)

We will try to merge a sequence of lines (type 2 + 3 + 4) into one single line. Example:

[2]      "1a": {
[3]        "2a": null,
[4]        "2b": null
[5]      },

will be merged into:

[2]      "1a": {"2a": null, "2b": null},

NOTE: Close bracket line may has trailing comma

TRUC Vu
  • 11
  • 2
0

This solution is not so elegant and generic as the others and you will not learn much from it but it's quick and simple.

def custom_print(data_structure, indent):
    for key, value in data_structure.items():
        print "\n%s%s:" % (' '*indent,str(key)),
        if isinstance(value, dict):
            custom_print(value, indent+1)
        else:
            print "%s" % (str(value)),

Usage and output:

>>> custom_print(data_structure,1)

 layer1:
  layer2:
   layer3_2: string
   layer3_1: [{'y': 7, 'x': 1}, {'y': 4, 'x': 0}, {'y': 3, 'x': 5}, {'y': 9, 'x': 6}]
Bula
  • 1,590
  • 1
  • 14
  • 33
0

As a side note, this website has a built-in JavaScript that will avoid line feeds in JSON strings when lines are shorter than 70 chars:

http://www.csvjson.com/json_beautifier

(was implemented using a modified version of JSON-js)

Select "Inline short arrays"

Great for quickly viewing data that you have in the copy buffer.

kashiraja
  • 740
  • 11
  • 24
0

Indeed, one of things YAML is better than JSON.

I can't get NoIndentEncoder to work..., but I can use regex on JSON string...

def collapse_json(text, list_length=5):
    for length in range(list_length):
        re_pattern = r'\[' + (r'\s*(.+)\s*,' * length)[:-1] + r'\]'
        re_repl = r'[' + ''.join(r'\{}, '.format(i+1) for i in range(length))[:-2] + r']'

        text = re.sub(re_pattern, re_repl, text)

    return text

The question is, how do I perform this on a nested list?

Before:

[
  0,
  "any",
  [
    2,
    3
  ]
]

After:

[0, "any", [2, 3]]
Polv
  • 1,918
  • 1
  • 20
  • 31
0

An alternate method if you would like to specifically indent arrays differently, could look something like this:

import json

# Should be unique and never appear in the input
REPLACE_MARK = "#$ONE_LINE_ARRAY_{0}$#"

example_json = {
    "test_int": 3,
    "test_str": "Test",
    "test_arr": [ "An", "Array" ],
    "test_obj": {
        "nested_str": "string",
        "nested_arr": [{"id": 1},{"id": 2}]
    }
}

# Replace all arrays with the indexed markers.
a = example_json["test_arr"]
b = example_json["test_obj"]["nested_arr"]
example_json["test_arr"] = REPLACE_MARK.format("a")
example_json["test_obj"]["nested_arr"] = REPLACE_MARK.format("b")

# Generate the JSON without any arrays using your pretty print.
json_data = json.dumps(example_json, indent=4)

# Generate the JSON arrays without pretty print.
json_data_a = json.dumps(a)
json_data_b = json.dumps(b)

# Insert the flat JSON strings into the parent at the indexed marks.
json_data = json_data.replace(f"\"{REPLACE_MARK.format('a')}\"", json_data_a)
json_data = json_data.replace(f"\"{REPLACE_MARK.format('b')}\"", json_data_b)

print(json_data)

You could generalize this into a function that would walk through each element of your JSON object scanning for arrays and performing the replacements dynamically.

Pros:

  • Simple and expandable
  • No use of Regex
  • No custom JSON Encoder

Cons:

  • Take care that user input never contains the replacement placeholders.
  • Might not be performant on JSON structures containing lots of arrays.

Motivation for this solution was a fixed-format generation of animation frames, where each element of the array was an integer index. This solution worked well for me and was easy to adjust.

Here is the more generic and optimized version:

import json
import copy

REPLACE_MARK = "#$ONE_LINE_ARRAY_$#"

def dump_arrays_single_line(json_data):
    # Deep copy prevent modifying original data.
    json_data = copy.deepcopy(json_data)

    # Walk the dictionary, putting every JSON array into arr.
    def walk(node, arr):
        for key, item in node.items():
            if type(item) is dict:
                walk(item, arr)
            elif type(item) is list:
                arr.append(item)
                node[key] = REPLACE_MARK
            else:
                pass

    arr = []
    walk(json_data, arr)

    # Pretty format but keep arrays on single line.
    # Need to escape '{' and '}' to use 'str.format()'
    json_data = json.dumps(json_data, indent=4).replace('{', '{{').replace('}', '}}').replace(f'"{REPLACE_MARK}"', "{}", len(arr)).format(*arr)

    return json_data
                

example_json = {
    "test_int": 3,
    "test_str": "Test",
    "test_arr": [ "An", "Array" ],
    "test_obj": {
        "nested_str": "string",
        "nested_arr": [{"id": 1},{"id": 2}]
    }
}

print(dump_arrays_single_line(example_json))
AgentM
  • 406
  • 4
  • 18