0

I have a json config, based on user input, need to filter out the config and get only specific section. I tried running the code mentioned below, it returns the partially expected results.

Config:

superset_config = """
                [ {
                      "Area":"Texas",
                      "Fruits": { 
                                  "RED": {
                                           "Apple":["val1"],
                                           "Grapes":["green"]
                                         },
                                  "YELLOW": {"key2":["val2"]} 
                                }
                  },
                  {
                      "Area":"Dallas",
                      "Fruits": { 
                                    "GREEN": { "key3": ["val3"]} 
                                } 
                  }
                ]
                """

User Input:

inputs = ['Apple'] # input list

Code:

import json

derived_config = []
for each_src in json.loads(superset_config):
    temp = {}
    for src_keys in each_src:
        if src_keys=='Fruits':
            temp_inner ={}
            for key,value in each_src[src_keys].items():
                metrics = {key_inner:value_inner for key_inner,value_inner in value.items() if key_inner in inputs}
                temp_inner[key]=metrics
            temp[src_keys] = temp_inner
        else:
            temp[src_keys] = each_src[src_keys]
    derived_config.append(temp)

what do I get from above code:

derived_config= [
                    {'Area': 'Texas', 
                     'Fruits': {'RED': {'Apple': 'val1'}, 
                                'YELLOW': {}
                               }
                    }, 
                    {'Area': 'Dallas', 
                    'Fruits': {'GREEN': {}
                              }
                    }
                ]

what is needed: I need below results

derived_config= [
                    {'Area': 'Texas', 
                     'Fruits': {'RED': {'Apple': 'val1'}
                               }
                    }
                ]

can anyone please help? thanks.

Matthew
  • 55
  • 7

1 Answers1

1

Maybe something like this:

import json

inputs = ['Apple'] # input list

derived_config = []
for each_src in json.loads(superset_config):
    filtered_fruits = {k: v for k, v in (each_src.get('Fruits') or {}).items()
                       if any(input_ in v for input_ in inputs)}

    if filtered_fruits:
        each_src['Fruits'] = filtered_fruits
        derived_config.append(each_src)

print(derived_config)

Edit: Based on the comments, it looks like you might want to filter the inner Fruits map based on the input list of fruits as well. In that case, we don't need to use the any function as above.

There is also an unintentional risk that we might mutate the original source config. For example, if you save the result of json.loads(superset_config) to a variable and then try to filter multiple fruits from it, likely it'll mutate the original config object. If you are directly calling jsons.load each time, then you don't need to worry about mutating the object; however you need to be aware that due to list and dict being mutable types in Python, this can be a concern to us.

The solution below does a good job of eliminating a possibility of mutating the original source object. But again, if you are calling jsons.load each time anyway, then you don't need to worry about this and you are free to modify the original config object.

import json

# Note: If you are using Python 3.9+, you can just use the standard collections
# for `dict` and `list`, as they now support parameterized values.
from typing import Dict, Any, List


# The inferred type of the 'Fruits' key in the superset config.
#   This is a mapping of fruit color to a `FruitMap`.
Fruits = Dict[str, 'FruitMap']
FruitMap = Dict[str, Any]

# The inferred type of the superset config.
Config = List[Dict[str, Any]]


def get_fruits_config(src_config: Config, fruit_names: List[str]) -> Config:
    """
    Returns the specified fruit section(s) from the superset config.
    """

    fruits_config: Config = []
    final_src: Dict

    for each_src in src_config:
        fruits: Fruits = each_src.get('Fruits') or {}
        final_fruits: Fruits = {}

        for fruit_color, fruit_map in fruits.items():
            desired_fruits = {fruit: val for fruit, val in fruit_map.items()
                              if fruit in fruit_names}
            if desired_fruits:
                final_fruits[fruit_color] = desired_fruits

        if final_fruits:
            final_src = each_src.copy()
            final_src['Fruits'] = final_fruits
            fruits_config.append(final_src)

    return fruits_config

Usage:

inputs = ['Apple'] # input list

config = json.loads(superset_config)
derived_config = get_fruits_config(config, inputs)

print(derived_config)
# prints:
#   [{'Area': 'Texas', 'Fruits': {'RED': {'Apple': ['val1']}}}]
rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • thanks rv.kvetch. Can you help me understand the use of 'any' in above code? – Matthew Sep 04 '21 at 04:40
  • also it does seem to be working for all cases. For ex: if I add one more entry under "RED" (added after your answer), it still returns all the values within matching block. for ex: in the above case it returns both Apple and Grapes ie --> [{'Area': 'Texas', 'Fruits': {'RED': {'Apple': 'val1', 'Grapes': 'green'}}}], althrough only Apple is required based on your input. can you help? – Matthew Sep 04 '21 at 05:24
  • 1
    The 'any' is a builtin function which in this case returns a boolean indicating whether the dictionary contains *any* of the keys we are looking for. As a specific example, given `inputs=['Apple', 'Orange']` it'll loop over each input and check if `derived_config[0]['Fruits']['RED']` contains at least an `Apple` or `Orange` key. Though from your last comment, I understand you might need a slightly modified solution, so the 'any' probably won't be needed here. – rv.kvetch Sep 04 '21 at 16:54
  • 1
    @Matthew I updated my answer based on the requested changes. hopefully that should now give the desired result. – rv.kvetch Sep 04 '21 at 17:52