0

I'm trying to find a way to add/remove one or more elements in a JSON object given the parent object keys of these elements regardless of the schema definition of the object.

Let's make an example. Suppose we have the following JSON object:

{
    "field1": "",
    "field2": "",
    "list1": [
        {
            "list1_field1": "",
            "list1_obj1": {
                "list1_obj1_field1": "",
            },
            "list1_field2": "",
        },
        {
            "list1_field1": "",
            "list1_obj1": {
                "list1_obj1_field1": "",
            },
            "list1_field2": "",
            "list1_field3": "",
            "list1_sublist1": [
                {
                    "list1_sublist1_field1": ""
                }
            ]
        }
      ]
}

Now, let's assume that I'd like to add a new field in the "list1_obj1" object in all of the elements of "list1". Then, the keys would be "list1" and "list1_obj1" and the new field would be, for example,"list1_obj1_field2".

To sum up, given in input the keys "list1" and "list1_obj1" I'd like to add or remove a new field at this nested level, but not considering the schema of the JSON object.

Of course, the assumption is that "list1" and "list1_obj1" exist in the JSON file, and in case of removal, "list1_obj1_field2" exists as well.

Now, the most problematic thing on which I'm struggling is to take into account nested object lists. If I don't consider that constraint, I could implement a solution like the ones in the following threads 1 2.

Then, trying to achieve that, I imagined a solution like the following:

# Remove item from the json object
# Suppose the json object is stored in a variable called "json_object"
keys = "list1.list1_obj1.list1_obj1_field2".split(".")
item = json_object
for i,key in enumerate(keys):
  
  if isinstance(item,dict):
    print("it's a dict")
    if key in item.keys():
      print(item)
      if i == len(keys)-1:
        # last item, so we can remove it
      else:
        item = item[key]
        
  else:
    print("it's a list")
    # loop on the list and for each element remove the item

in case the nested item is a list I think I should iterate on that and for each element find the correct item to remove. However, I find this solution inefficient. Also, I tried unsuccessfully to figure out a way to make the function recursive.

Any hint would be really appreciated.

Many thanks

EDIT 1:

I managed to implement a first recursive version.

def remove_element(obj, keys, current_key=0):
  
  """
    obj: the item passed in the function. At the beginning it is the entire json object
    keys: list that represents the complete key path from the root to the interested field
    current_key: index which points to keys list elements
  """
  
  if isinstance(obj, dict):
    for k in obj.keys():
        if k == keys[current_key]:
          if isinstance(obj[k], dict):
            obj[k] = remove_element(obj[k], keys, current_key+1)
          elif isinstance(obj[k], list):
            for i in range(len(obj[k])):
                obj[k][i] = remove_element(obj[k][i],keys, current_key+1)
          else:
            obj[k] = ""
            
    return obj

Currently, the function doesn't delete the desired field, but it only set it to "", since I would get a RuntimeError: dictionary changed size during iteration if I try to delete it (del obj[k]).

The improvement is that now it's possible to reach a field without consider the schema. However, it's still not possible to delete it and it's possible to access only to the fields which don't have children (everything that is not a list or dict).

sergio zavota
  • 157
  • 3
  • 11

1 Answers1

0

I finally managed to implement the add and remove methods. Actually, since the update method is very similar to the remove one (only 1 line of code changes) I integrated them in one function.

def add(obj, keys, obj_copy, value, current_key=0):  
  """
  obj: the item passed in the function. At the beginning it is the entire json object
  keys: the complete key path from the root to the interested field
  obj_copy: copy of obj. obj is used to iterate, obj_copy is updated. 
            This is done to prevent the "update dictionary during a loop" Runtime Error  
  value: value used to add the desired field. It can be a primitive type, a dict or a list
  current_key: index which points to keys list 
  """
  
  if isinstance(obj, dict):
    for k in obj.keys():
      if current_key != (len(keys)-1):
        if k == keys[current_key]:
          if isinstance(obj[k], dict):
            obj_copy[k] = add(obj[k], keys, obj_copy[k], value, current_key+1)
          elif isinstance(obj[k], list):
            for i in range(len(obj[k])):
                obj_copy[k][i] = add(obj[k][i], keys, obj_copy[k][i], value, current_key+1)
      else:
        obj_copy[keys[current_key]] = value
        break
            
  return obj_copy


def update(obj, keys, obj_copy, function, value=None, current_key=0):
  
  """
  obj: the item passed in the function. At the beginning it is the entire json object
  keys: the complete key path from the root to the interested field
  obj_copy: copy of obj. obj is used to iterate, obj_copy is updated. 
            This is done to prevent the "update dictionary during a loop" Runtime Error  
  function: "delete" if you want to delete an item, "update" if you want to update it
   value: value used to update the desired field. It can be a primitive type, a dict or a list
  current_key: index which points to keys list 
  """
  
  if isinstance(obj, dict):
    for k in obj.keys():
      if k == keys[current_key]:
        if current_key != (len(keys)-1):
          if isinstance(obj[k], dict):
            obj_copy[k] = update(obj[k], keys, obj_copy[k], function, value, current_key+1)
          elif isinstance(obj[k], list):
            for i in range(len(obj[k])):
                obj_copy[k][i] = update(obj[k][i], keys, obj_copy[k][i], function, value, current_key+1)
        else:
          if function == "delete":
            del obj_copy[k]
          else:
            obj_copy[k] = value
            
  return obj_copy

The only thing about which I'm not satisfied is that the "add" and "update" are almost the except for the 1 line of code and the fact that it is needed to switch the sequential order of these if statements:

# update
if k == keys[current_key]:
   if current_key != (len(keys)-1):

# add
if current_key != (len(keys)-1):
  if k == keys[current_key]:

I'll look forward to figure out how to optimize the solution.

Furthermore, in order to make the interface simpler, I implemented a wrapper function. Here is how it works

# WRAPPER
def update_element(obj: dict, keys: str, function: str, value=None):
  
  """
  Description:
    remove or update an element 
    
  Input:
    obj: the object passed in the function.
    keys: the complete key path from the root to the interested field. The fields have to be separated by "."
    function: "delete" if you want to delete an item, "update" if you want to update it, "add" if you want to add it
    value: value used to update the desired field. It can be a primitive type, a dict or a list
  """
  
  keys = keys.split(".")
  obj_copy = deepcopy(obj)
  
  if function == "add":
    output = add(obj, keys, obj_copy, value)
  elif function == "update" or function == "delete":
    output = update(obj, keys, obj_copy, function, value)
  else:
    return {"message": "error: no function recognized. Possible values are: 'add', 'delete' or 'update' "}
  
  return output

Example:

thisdict =  {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}

delete_output = update_element(thisdict, "model", "delete")
update_output = update_element(thisdict, "model", "update", "Fiesta")
add_output = update_element(thisdict, "used", "add", "yes")
sergio zavota
  • 157
  • 3
  • 11