0

I have a list of json objects like this -

[{Level1: "A", Level2: "B", Level3: "C", Level4: "item1"}, 
 {Level1: "A", Level2: "B", Level3: null, Level4: "item2"}, 
 {Level1: "D", Level2: null, Level3: null, Level4: "item3"}]

In Python, I want to group them by level to create a tree structure.

{text: "root": items: 
    [{text: "A", items: [ 
         {text: "B", items: [
              {text: "C", items: [
                  text: "item1", items:[]]}, 
              {text: "item2", items: []}}]}, 
     {text: "D", items: [{text: "item3", items: []}]}
    ]
]}

# pseudocode 
result = dict()
result["text"] = "root"
result["items"] = [] 

d = {"Level1": set(), "Level2": set(), "Level3": set(), "Level4": set()}

for row in data_rows: 
    insertLocation = result["items"] 
    for key in ["Level1", "Level2", "Level3", "Level4"]: 
        txt = row[key]
        if txt in d[key]: 
            for j in insertLocation: 
                if j.text = txt: 
                     insertLocation = j
                     break  
        else: 
            newItem = {"text": txt, "items": []}
            insertLocation = newItem.items
            d[key].add(txt)
        

Can anyone provide any feedback on my code to perhaps make it more efficient? (or if there's a better way to do this, would be super great to know). I'm really looking to maximize efficiency.

sy89
  • 195
  • 2
  • 14
  • Does your program do what you want it to do? – Trenton McKinney Jun 24 '20 at 02:41
  • It is unclear to me if you have a problem with the code or just want a feedback on it. – Vinícius A. L. Souza Jun 24 '20 at 02:51
  • 1
    it works - I just don't love how it looks and wondering if there's any smarter ways to do it. Like more concise ways – sy89 Jun 24 '20 at 03:04
  • There is definitely a more elegant way, and I imagine it involves nested `defaultdict` objects. See here: https://stackoverflow.com/questions/19189274/nested-defaultdict-of-defaultdict Since you only care about the values, you could immediately simplify the problem by changing your input to be just the dict values. You only care about the order of the non-null values. – Peaceful James Jun 24 '20 at 03:52
  • 2
    I’m voting to close this question because it belongs on [Code Review](https://codereview.stackexchange.com/). Optimization questions tend to be opinion based, and are outside the scope of SO. – Trenton McKinney Jun 24 '20 at 04:23

3 Answers3

0

I suggest you check this answer by @anom on "What are some good code optimization methods?". In summary:

Step 1. Do not think about performance, think about clarity and correctness.
...
Step 4. See step 1.
0

Here is how I would suggest doing it. To understand this, you will have to understand:

  1. functools.reduce
  2. collections.defaultdict
  3. Recursion
  4. List comprehensions

Basically, I first change the input (because the keys don't matter), to be a list of lists of non-null values. Then I functools.reduce the lists of non-null values into a simple tree structure which is of the shape that we want. Everything in this simple_tree is a collections.defaultdict so I finally convert these into normal dicts and use recursion to get the "text"/"items" structure that you want.

This approach is more "functional" than your approach, and is smarter about pruning the input.

import collections
import functools


sequence = [
    {"Level1": "A", "Level2": "B", "Level3": "C", "Level4": "item1"},
    {"Level1": "A", "Level2": "B", "Level3": None, "Level4": "item2"},
    {"Level1": "D", "Level2": None, "Level3": None, "Level4": "item3"},
]


# Change the input to be only the non-null values (order is preserved)
# I am using 2 list comprehensions here
seq_values = [[y for y in x.values() if y is not None] for x in sequence]


# Define a recursive defaultdict
# https://stackoverflow.com/questions/19189274/nested-defaultdict-of-defaultdict
def recursive_defaultdict():
    return collections.defaultdict(recursive_defaultdict)


# Use reduce to get a simple tree in the shape/structure we want
def reduce_function(agg, cur):
    sub_agg = agg
    for seq_value in cur:
        sub_agg = sub_agg[seq_value]
    return agg


# Use reduce to get a simple tree in the shape/structure we want
simple_tree = functools.reduce(
    reduce_function,
    seq_values,
    recursive_defaultdict()
)


# Use recursion to change simple "defaultdict tree" into custom text/items tree
def convert_defaultdict_tree_to_custom_dict(dd_tree):
    if dd_tree:
        return [{"text": x, "items": convert_defaultdict_tree_to_custom_dict(y)} for x, y in dd_tree.items()]
    return []


# Here is the final_result, the thing you want.
final_result = {
    "text": "root",
    "items": convert_defaultdict_tree_to_custom_dict(simple_tree)
}

# Test the final_result is correct
expected_result = {
    "text": "root",
    "items": [
        {
            "text": "A",
            "items": [
                {
                    "text": "B",
                    "items": [
                        {
                            "text": "C",
                            "items": [
                                {"text": "item1", "items": []},
                            ],
                        }, {
                            "text": "item2",
                            "items": [],
                        }
                    ],
                }
            ],
        },
        {"text": "D", "items": [{"text": "item3", "items": []}]},
    ],
}

assert final_result == expected_result

You can see at the end I do an assert to make sure the final_result is what we want. Let me know if you want help understanding any of this.

Peaceful James
  • 1,807
  • 1
  • 7
  • 16
  • I'm happy you like it! The 4 things I mentioned at the start are all you need to know. I can explain any of them in simple language if you want. I have to lie down now. All that recursion has given me vertigo... – Peaceful James Jun 24 '20 at 04:39
  • I actually am familiar with functional programming so it makes perfect sense - thanks again! – sy89 Jun 24 '20 at 05:16
0

Here's an alternative:

data_rows = [
 {"Level1": "A", "Level2": "B", "Level3": "C",  "Level4": "item1"}, 
 {"Level1": "A", "Level2": "B", "Level3": None, "Level4": "item2"}, 
 {"Level1": "D", "Level2": None,"Level3": None, "Level4": "item3"}
]

result = { "root": {} }

for row in data_rows:
    current_level = result["root"]
    # go through every "level" (key) in the row except the last one
    for level in row.keys()[:-1]:
        # make sure the level is not None
        if row[level]:
            # check if the level already exists
            if row[level] not in current_level:
                # if it doesn't, create this level
                current_level[row[level]] = {}
            # we move to the new level
            current_level = current_level[row[level]]

    # we set the "item" in the last valid level
    current_level["items"]=row[row.keys()[-1]]

import json
print(json.dumps(result, indent=4))

This will create a tree as below:

{
    "root": {
        "A": {
            "B": {
                "items": "item2",
                "C": {
                    "items": "item1"
                }
            }
        },
        "D": {
            "items": "item3"
        }
    }
}

So instead of having a separate key "text" with the level name, the level name itself becomes the key.

The above code will work with any number of levels, as long as the last item in every data_row is the "item"

kamion
  • 461
  • 2
  • 9