3

I am using python to flatten and unpack JSON structures. I have already figured out flattening and can flatten JSON files into dictionary structures like this:

# Given the JSON
{
    "a": "thing",
    "b": {
            "c": "foo"
         },
    "array": [
        "item1",
        "item2"
    ]
}

Then flatten() it into:

{
    "a": "thing",
    "b.c": "foo",
    "array.[0]": "item1",
    "array.[1]": "item2"
}

But any ideas on how to unpack those flattened dicts back into the original json? I had an idea on how to do it using string.split() on the key names but the arrays complicated things and now I don't know how to go about doing it. The trouble is arrays can have items that themselves are another array or dict. I am guessing something recursive?

UPDATE: So I have looking around for packages that unflatten (or flatten + unflatten, I don't care) and I found this one, which works well except it can't handle paths that include the separator character as part of the key name.

For example I had a path that flattened down into REG_SRC.http://www.awebsite.com/ but when unflattened, it got a little mangled because the dots in the URL were interpreted as key seperators. Does anyone know of a library that can handle key names with any text? Even text containing the separator character? I am assuming it would require the flat paths to be quote encapsulated or something "REG_SRC"."http://www.awebsite.com/"

Rosey
  • 739
  • 1
  • 12
  • 27
  • Possible duplicate of [How to flatten nested JSON recursively, with flatten\_json?](https://stackoverflow.com/questions/58442723/how-to-flatten-nested-json-recursively-with-flatten-json). If you use the `flatten` package, there is also a method to unflatten them. – Trenton McKinney Oct 18 '19 at 19:37

1 Answers1

0

You could try this:

import re
from bisect import insort

RE_SPLIT = re.compile(r'(?<!\\)\.')
RE_INDEX = re.compile(r'\[(\d+)\]')

data = {
    "a": "thing",
    "b.c": "foo",
    "array.[1]": "item2",
    "array.[0]": "item1",
    "some\\.text": 'bar2',  # you need to difference between `.` path operator and normal . char
}

result = {}

for key, value in data.items():
    paths = RE_SPLIT.split(key)
    if len(paths) > 1:
        subkey, subvalue = paths
        m = RE_INDEX.search(subvalue)
        if m:
            # if you care about order this code needs to be enhanced
            insort(result.setdefault(subkey, []),((int(m.group(1)),value)))
        else:
            result.setdefault(subkey, {})[subvalue] = value
    else:
        result[key.replace('\\.', '.')] = value

# convert tuple of elements to just one element like (0, item) to item
for k in result:
    if isinstance(result[k], list):
        result[k] = [e[1] for e in result[k]]

print(result)
{'a': 'thing', 'b': {'c': 'foo'}, 'array': ['item1', 'item2'], 'some.text': 'bar2'}

I used bisect module to insert the items of list in order. as you can see item2 is in the second position.

Charif DZ
  • 14,415
  • 3
  • 21
  • 40
  • I do not care about key order. List item order may matter though. I don't think your solution will work though as it also does a simple string.split() on the "." char, which will break any flat packed keys that include the "." char. I am able to change the pattern of the flat packed paths though. Could your solution still be used if the paths included escape chars for the "." chars? – Rosey Oct 21 '19 at 15:10
  • @Rosey yes the solution will work using `REGEX` I split only by `.` that is not proceeded by `\`. and I remove the escaping character in the final result. check my edits – Charif DZ Oct 22 '19 at 11:14