flattening linked dictionary values

Question

How would I flatten:

a = {}
a["foo"] = ["bar"]
a["bar"] = ["baz", "baz2"]
a["baz"] = ["bax"]

so that a["foo"] = ["bar", "baz", "baz2", "bax"].

Another example:

a = {}
a["foo"] = ["bar"]
a["bar"] = ["baz"]
a["bar2"] = ["bax"]

output: a["foo"] = ["bar", "baz"] and a["bar2"] = ["bax"]

Notice how the bar2 index is still there because it has not been merged/flattened.

Does this answer your question? [How do I make a flat list out of a list of lists?](https://stackoverflow.com/questions/952914/how-do-i-make-a-flat-list-out-of-a-list-of-lists) — Nathaniel Ford, Dec 05 '22 at 21:40
@NathanielFord: I think the question here is more convoluted than merely flattening embedded lists. — Swifty, Dec 05 '22 at 21:49
I mean, I could recommend closing for a number of other reasons: the second example provides, for instance, no distinguishing characteristic from the first, but expects a different output - meaning it should be closed for lacking detail. Alternatively, we could close it for the fact that the OP hasn't actually tried anything, shown code or explained what actual obstacle they're facing in their own efforts. Meaning it doesn't meet the minimum guidelines for a question. — Nathaniel Ford, Dec 05 '22 at 21:53
I can agree on most of that; the 2nd example though differs from the first in that it holds 2 sets of separate "primary" keys. — Swifty, Dec 05 '22 at 22:05
@NathanielFord a missing attempt is not a reason for closure. You can downvote if you want for lack of research/attempt. But closing should be if it's a duplicate or there are missing details — Tomerikoo, Dec 05 '22 at 22:12
@maxisme Why is `bar2` not flattened, but the others are? Is it the suffix, or some other reason? — Nathaniel Ford, Dec 05 '22 at 23:21
@NathanielFord the bar2 key contains `"bax"` which is not a key in the dictionary. — Chris, Dec 05 '22 at 23:37

Lexpj · Accepted Answer · 2022-12-06T10:53:43.170

A simple recursive function could solve this for you:

a = {}
a["foo"] = ["bar"]
a["bar"] = ["baz", "baz2"]
a["baz"] = ["bax"]

def flatten(dic,arg):
    if dic.get(arg,[]) == []:
        return []
    lst = []
    for item in dic[arg]:
        lst += flatten(dic,item) + [item]
    return lst 

print(flatten(a,'foo'))
>> ['bax', 'baz', 'baz2', 'bar']

The dic argument defines what dictionary you are using, the arg is the key of the dictionary. The function tries to find for every ite, a new entry as key in the dictionary. This algorithm performs a tree-like walk to the roots, and adds the items when it goes away from each root. This is also known as a post-order walk. You can then do this for every entry in the dictionary to flatten all keys.

NOTE that I did not take into account order. You could use another tree-walking method, to change the order of when items are added to the list.

To complete the answer, I applied it to your second example. I made a new dictionary that flattens the current dictionary by:

a = {i:flatten(a,i) for i in a.keys()}

So, the complete code will then be:

a = {}
a["foo"] = ["bar"]
a["bar"] = ["baz"]
a["bar2"] = ["bax"]

def flatten(dic,arg):
    if dic.get(arg,[]) == []:
        return []
    lst = []
    for item in dic[arg]:
        lst += flatten(dic,item) + [item]
    return lst 

a = {i:flatten(a,i) for i in a.keys()}
>> {'foo': ['baz', 'bar'], 'bar': ['baz'], 'bar2': ['bax']}

NOTE: As noted in the comments, this implementation does not work when you are working with:

A key value pair where the key is contained in the value
Two key value pairs where key 1 is contained in value 2 and key 2 in value 1 These two scenarios create a cycle, which, will never end. To solve the first scenario, make sure for each key value pair, that the key is not contained in said value.

# Scenario 1
a['foo'] = ['foo', 'bar']
a['bar'] = ['baz']

should be something along the lines of*:

a['foo'] = a['foo'] - ['foo']

The second scenario is a little more tricky. The result of the flatten function is the same for key 1 and key 2. Ultimately, this would mean that calling the flatten function for one of the keys, and having both keys removed from their combined values, would work:

# Scenario 2
a['foo'] = ['bar','baz']
a['bar'] = ['foo','bax']

should therefore be something along the lines of*:

a['foo'] = a['bar'] = a['foo'] + f['bar'] - ['foo','bar']

However, the implementation itself and how you would solve this for different (longer, more complicated) cycles, is what I believe, out of the scope of the question.

*: the minus operator doesn't work on lists like this, but I bet you could come up with a solution for that, if you are going to look into this.

NOTE: You mentioned in the comments that you do not want 'bar' as key of your flattened dictionary. You will want to find the root of the dictionary such that any children are not keys. In order to do that, you need check for each key, if it is not in a value of the same or other keys. This code should do the trick:

allValues = []
for item in a.values():
    allValues += item
roots = {key:a[key] for key in a.keys() if key not in allValues}
>> {'foo': ['baz', 'bar'], 'bar2': ['bax']}

First it combines all values of the keys. Then it will make the dictionary by looking at which key does not occur in the list of all of the values. The dictionary roots gives you the dictionary as desired.

Modifying to avoid cyclic "references" is an exercise for the OP. Imagine flattening `{"bar": ["foo"], "foo": ["bar"]}` — Chris, Dec 05 '22 at 23:39
Be careful in Python of using `+` in a loop like this. Sometimes it can't be helped, but in Python it will require the copying of the first list, plus the new item, into a new list. — Nathaniel Ford, Dec 06 '22 at 00:07
Hello, thank you for your detailed response - however have you seen my second example. I do not expect the key `bar`. This would still do this. — maxisme, Dec 06 '22 at 09:32
In order to overcome that, you would want to find the root of the tree. You could do that by checking which key does not occur as value to the same or other keys. I will edit that part in. — Lexpj, Dec 06 '22 at 10:45

Nathaniel Ford · Answer 2 · 2022-12-06T00:10:15.533

I wanted to offer a solution that utilizes a reverse lookup, since that is often a strategy that works well. I've renamed the variables to be clearer.

single_parent = {
    "a": ["b"],
    "b": ["c", "d"],
    "c": ["e"],
}

multiple_parent = {
    "a": ["b"],
    "b": ["c"],
    "d": ["e"],
}


def urparent(reverse: dict, start: str) -> str:
    if start in reverse:
        return urparent(reverse, reverse.get(start))
    return start


def flatten_refs(dx: dict) -> dict:
    reverse = dict()
    for key, refs in dx.items():
        for ref in refs:
            reverse[ref] = key

    forward = dict()
    for key, parent in reverse.items():
        apex = urparent(reverse, parent)
        current = forward.get(apex, [])
        current.append(key)
        forward[apex] = current

    return forward


print(flatten_refs(single_parent))
print(flatten_refs(multiple_parent))

Which outputs:

{'a': ['b', 'c', 'd', 'e']}  # single_parent
{'a': ['b', 'c'], 'd': ['e']}  # multiple_parent

Essentially, you visit each element of each list and create a 'reverse lookup' dictionary, wherein the element points to it's parent. Then, for each element in that reverse lookup, you trace up to the parent level. Worst-case this is O(n^2), but you can easily optimize urparent() to cache results. Further, that function can be modified to ensure that loops don't cause issues. (What your algorithm should do in this situation is unclear - perhaps the parent is the last item before it loops, perhaps it throws it out, perhaps something else?)

score -1 · Answer 3 · answered Dec 05 '22 at 22:08

This seems to work:

a = {}
a["bax"] = ["test"]
a["foo"] = ["bar"]
a["bar"] = ["baz", "baz2"]
a["baz"] = ["bax"]
a["aaa"] = ["bbb"]
a["ccc"] = ["ddd", "aaa", "eee"]
a["123"] = ["456"]

vals = list(a.values())
flag = True
while flag:
    flag = False
    for v in vals:
        for st in v:
            if st in a:
                flag = True
                v += a[st]
                a.pop(st)

a
# Out[103]: 
# {'foo': ['bar', 'baz', 'baz2', 'bax', 'test'],
#  'ccc': ['ddd', 'aaa', 'eee', 'bbb'],
#  '123': ['456']}

flattening linked dictionary values

3 Answers3