-2

The structure of json tree is known. However, how do we prune the json tree in Python3?

I had been trying to create a medical file format for patients. Each json object is a case or detail about a patient.

I tried linearizing the json, and count the levels, but the code quickly gets untenable. I also looked at binary trees, but this is not a binary tree. I attempted to itemized each json object as an atom, which means it would be a form of pointer, however, python does not have pointers.

Examples:

  1. insert / replace json into 0.1.2
  2. delete json at 0.1.1.3
  3. extract json at 0.1.1.1 // may be sub-tree
{ // 0
    "field1": "value1",
    "field2": { // 0.0
        "field3": "val3",
        "field4": "val4"
    }
}

For example, I want to remove 0.0:

{ // 0
    "field1": "value1",
// removed
}

to insert 0.1:

{ // 0
    "field1": "value1",
    "field2": { // 0.0
        "field3": "val3",
        "field4": "val4"
    }

    "field2x": { // 0.1
        "field3x": "val3x",
        "field4x": "val4x"
    }

}

0.1 must be given:


    "field2x": { // 0.1
        "field3x": "val3x",
        "field4x": "val4x"
    }

now i want to insert 0.1.0:


    "field2xx": { // 0.1.0
        "field3xx": "val3xx",
        "field4xx": "val4xx"
    }
{ // 0
    "field1": "value1",
    "field2": { // 0.0
        "field3": "val3",
        "field4": "val4"
    }

    "field2x": { // 0.1
        "field3x": "val3x",
        "field4x": "val4x"

         "field2xx": { // 0.1.0
             "field3xx": "val3xx",
             "field4xx": "val4xx"
         }
    }

}

now I want to extract 0.1, it should give me:


    "field2x": { // 0.1
        "field3x": "val3x",
        "field4x": "val4x"

         "field2xx": { // 0.1.0
             "field3xx": "val3xx",
             "field4xx": "val4xx"
         }
    }

leaving:


{ // 0
    "field1": "value1",
    "field2": { // 0.0
        "field3": "val3",
        "field4": "val4"
    }

// removed 0.1

}
Ursa Major
  • 851
  • 7
  • 25
  • 47
  • 2
    What am have you tried, and what exactly is the problem with it? – jonrsharpe Jun 23 '22 at 07:31
  • I tried linearizing the json, and count the levels, but the code quickly gets untenable. I also looked at binary trees, but this is not a binary tree. I attempted to itemized each json object as an atom, which means it would be a form of pointer, however, python does not have pointers. – Ursa Major Jun 23 '22 at 07:34
  • @jonrsharpe, is there other forums I can get some answers if this question gets closed? Thanks for your help. – Ursa Major Jun 23 '22 at 07:36
  • Please [edit] the question to show your research, as recommended in the [help]. – jonrsharpe Jun 23 '22 at 07:36
  • I had been trying to create a medical file format for patients. Each json object is a case or detail about a patient. – Ursa Major Jun 23 '22 at 07:39
  • @jonrsharpe, I added notes. – Ursa Major Jun 23 '22 at 07:42
  • 1
    Why do you want to access fields by a sort of index number? Why not by the key itself, like "root.field2" instead of "0.1". – trincot Jun 23 '22 at 12:43
  • The solution will need both. It is for medical application. There are some indices that are reserved fields. – Ursa Major Jun 23 '22 at 14:12
  • 1
    Assuming a certain order of properties in a plain object, so you can rely on an index to access them, is not considered good practice: when you want to access things by index, you should use *arrays*, not plain objects. – trincot Jun 25 '22 at 07:59
  • Can you provide some code examples, @trincot? Thanks for help. – Ursa Major Jun 27 '22 at 09:51
  • What kind of reputable source are you looking for? – user3840170 Jul 09 '22 at 10:37
  • A working source with multiple test cases will do fine. – Ursa Major Jul 11 '22 at 21:36

3 Answers3

3

I would highly recomend not attempting to use indices to find a field in a dictionary. With how JSON works, and their usual mappings to dictionaries/maps in a programming language, You generally cannot guarantee that ordering of the keys is preserved. However, depending on your specific version it may work, you can check the documentation at https://docs.python.org/3.10/library/json.html#json.dump

If you really need to use this kind of access and operations, then given a dictionary dict you can find the key at index i using list(dict.keys())[i], and it's value using list(dict.values())[i]. With that, you can parse your input parameters, crawl to the point in the document you need to make your operation, and perform that operation.

Again, I highly, highly advise against this approach as you want to use arrays instead of objects/dictionaries/maps if ordering is important. But if you really have no control over the input format, and you can guarantee that key ordering is preserved, then the above would work.

MangoNrFive
  • 1,541
  • 1
  • 10
  • 23
user731842
  • 202
  • 2
  • 6
2

json.load() and json.loads() in the standard library take an object_pairs_hook parameter that lets you create custom objects from the JSON source.

You want a dict that lets you access items by index as well as by key. So the strategy is to provide a mapping class that lets you access the items either way. Then provide that class as the object_pairs_hook argument.

There is probably a library that does this, but my Google-fu is off this morning and I couldn't find one. So I wiped this up. Basically the class keeps an internal list of keys by index as well as a regular dict. The dunder methods keep the list and dict in synch.

import json
from collections.abc import MutableMapping

class IndexableDict(MutableMapping):
    def __init__(self, *args, **kwds):
        self.key_from_index = []
        self.data = {}
        
        if args:
            for key, value in args:
                self.__setitem__(key, value)
                
        if kwds:
            for key, value in kwds.items():
                self.__setitem__(key, value)
            
        
    def __setitem__(self, key_or_index, value):
        if isinstance(key_or_index, (tuple, list)):
            obj = self
            for item in key_or_index[:-1]:
                obj = obj[item]
                
            obj[key_or_index[-1]] = value

        elif isinstance(key_or_index, int):
            key = self.key_from_index[key_or_index]
            self.data[key] = value
            
        elif isinstance(key_or_index, str):
            if key_or_index not in self.data:
                self.key_from_index.append(key_or_index)
                
            self.data[key_or_index] = value
            
        else:
            raise ValueError(f"Unknown type of key '{key}'")

            
    def __getitem__(self, key_or_index):
        if isinstance(key_or_index, (tuple, list)):
            obj = self
            for item in key_or_index:
                obj = obj[item]
                    
            return obj
        
        elif isinstance(key_or_index, int):
            key = self.key_from_index[key_or_index]
            return self.data[key]
            
        elif isinstance(key_or_index, str):
            return self.data[key_or_index]
        
        else:
            raise ValueError(f"Unknown type of key '{key_or_index}'")
            
    
    def __delitem__(self, key_or_index):
        if isinstance(key_or_index, (tuple, list)):
            obj = self
            for item in key_or_index[:-1]:
                obj = obj[item]
                
            del obj[key_or_index[-1]]
        
        elif isinstance(key_or_index, int):
            key = self.key_from_index[key_or_index]
            del self.data[key]
            del self.key_from_index[key_or_index]
            
        elif isinstance(key_or_index, str):
            index = self.key_from_index.find(key_or_index)
            del self.key_from_index[index]
            del self.data[key_or_index]
        
        else:
            raise ValueError(f"Unknown type of key '{key_or_index}'")
       
    
    def __iter__(self):
        yield from self.data.items()
        
        
    def __len__(self):
        return len(self.data)
    
    def __repr__(self):
        s = ', '.join(f'{k}={repr(v)}' for k, v in self)
        if len(s) > 50:
            s = s[:47] + '...'
        return f'<IterableDict({s})>'

It can be used like this:

data = """{"zero":0, "one":{"a":1, "b":2}, "two":[3, 4, 5]}"""

def object_pairs_hook(pairs):
    return IndexableDict(*pairs)

dd = json.loads(data, object_pairs_hook=object_pairs_hook)


print(dd[0], dd['zero'])  # get values by index or key
print(dd[(1,0)])          # get values by a list or tuple of keys
                          #    equivalent to dd[1][0]

print(dd[(2,1)])
dd[['two', 1]] = 42       # sequence works to set a value too
print(dd[(2,1)])
            

Prints:

0 0
1
4
42

No time to do an insert(), but is should be similar to __setitem__(). It has not been tested much, so there may be some bugs. It could also use some refactoring.

RootTwo
  • 4,288
  • 1
  • 11
  • 15
  • I want to transfer some bounty points (+50) to you, @RootTwo, can you advise me if there is a means to do so? – Ursa Major Jul 21 '22 at 21:26
  • 1
    @UrsaMajor, I'm not sure. I think you can start a bounty using the link at the end of the question. Then come back after 24 hours and assign it to an answer. – RootTwo Jul 21 '22 at 22:08
1

I second the people saying that indexing a dictionary by position is not the natural way, but it is possible since python3.7 as the dict is insertion-ordered as a guaranteed language-feature in python.

This is my working example, the indices are different than your schematic, but it made more sense for me to index it like that. It makes use of recursive traversing of the data by the given indices and then depending on the operation removing, inserting or returning the nested data.

The insertion of data makes use of the mentioned ordering by insertion in python. data.update(dict(**insert, **after))

  • It leaves the data before the insertion as is (so it is older and thus staying in front)
  • Then it updates the inserted data
  • And last the data after the inserted data (making it the oldest and thus at the back).
from copy import deepcopy
import itertools
import json


def traverse(data, index_list):
    index = index_list.pop()
    if index_list:
        nested_data = list(data.values())[index]
        return traverse(nested_data, index_list)
    return data, index


def insert(data, data_insert, index_list):
    data, index = traverse(data, index_list)
    after = dict(itertools.islice(data.items(), index)) or None
    data.update(dict(**data_insert, **after))


def remove(data, index_list):
    key, data = get(data, index_list)
    return {key: data.pop(key)}


def get(data, index_list):
    data, index = traverse(data, index_list)
    key = list(data.keys())[index]
    return key, data


def run_example(example_name, json_in, index_str, operation, data_insert=None):
    print("-" * 40 + f"\n{example_name}")

    print(f"json before {operation} at {index_str}:")
    print(json.dumps(json_in, indent=2, sort_keys=False))

    index_list = [int(idx_char) for idx_char in index_str.split(".")]
    if operation == "insert":
        json_out = insert(json_in, data_insert, index_list)
    elif operation == "remove":
        json_out = remove(json_in, index_list)
    elif operation == "get":
        key, data = get(json_in, index_list)
        json_out = {key: data[key]}
    else:
        raise NotImplementedError("Not a valid operation")

    print(f"json after:")
    print(json.dumps(json_in, indent=2, sort_keys=False))

    print(f"json returned:")
    print(json.dumps(json_out, indent=2, sort_keys=False))


json_data = {
    "field1": "value1",
    "field2": {
        "field3": "val3",
        "field4": "val4"
    }
}

run_example("example 1", deepcopy(json_data), "1", "remove")
run_example("example 2", json_data, "2", "insert", {"field2x": {"field3x": "val3x", "field4x": "val4x"}})
run_example("example 3", json_data, "2", "get")
run_example("example 4", json_data, "2.2", "insert", {"field2xx": {"field3xx": "val3xx", "field4xx": "val4xx"}})
run_example("example 5", json_data, "2", "remove")

This gives the following output:

----------------------------------------
example 1
json before remove at 1:
{
  "field1": "value1",
  "field2": {
    "field3": "val3",
    "field4": "val4"
  }
}
json after:
{
  "field1": "value1"
}
json returned:
{
  "field2": {
    "field3": "val3",
    "field4": "val4"
  }
}
----------------------------------------
example 2
json before insert at 2:
{
  "field1": "value1",
  "field2": {
    "field3": "val3",
    "field4": "val4"
  }
}
json after:
{
  "field1": "value1",
  "field2": {
    "field3": "val3",
    "field4": "val4"
  },
  "field2x": {
    "field3x": "val3x",
    "field4x": "val4x"
  }
}
json returned:
null
----------------------------------------
example 3
json before get at 2:
{
  "field1": "value1",
  "field2": {
    "field3": "val3",
    "field4": "val4"
  },
  "field2x": {
    "field3x": "val3x",
    "field4x": "val4x"
  }
}
json after:
{
  "field1": "value1",
  "field2": {
    "field3": "val3",
    "field4": "val4"
  },
  "field2x": {
    "field3x": "val3x",
    "field4x": "val4x"
  }
}
json returned:
{
  "field2x": {
    "field3x": "val3x",
    "field4x": "val4x"
  }
}
----------------------------------------
example 4
json before insert at 2.2:
{
  "field1": "value1",
  "field2": {
    "field3": "val3",
    "field4": "val4"
  },
  "field2x": {
    "field3x": "val3x",
    "field4x": "val4x"
  }
}
json after:
{
  "field1": "value1",
  "field2": {
    "field3": "val3",
    "field4": "val4"
  },
  "field2x": {
    "field3x": "val3x",
    "field4x": "val4x",
    "field2xx": {
      "field3xx": "val3xx",
      "field4xx": "val4xx"
    }
  }
}
json returned:
null
----------------------------------------
example 5
json before remove at 2:
{
  "field1": "value1",
  "field2": {
    "field3": "val3",
    "field4": "val4"
  },
  "field2x": {
    "field3x": "val3x",
    "field4x": "val4x",
    "field2xx": {
      "field3xx": "val3xx",
      "field4xx": "val4xx"
    }
  }
}
json after:
{
  "field1": "value1",
  "field2": {
    "field3": "val3",
    "field4": "val4"
  }
}
json returned:
{
  "field2x": {
    "field3x": "val3x",
    "field4x": "val4x",
    "field2xx": {
      "field3xx": "val3xx",
      "field4xx": "val4xx"
    }
  }
}
MangoNrFive
  • 1,541
  • 1
  • 10
  • 23