Here's an updated version of my answer in which leaves of the tree data-structure are now different from those in rest of it. Instead of the tree being strictly a dict
-of-nested-dict
s, the "leaves" on each branch are now instances of a different subclass of dict
named collections.Counter
which are useful for counting the number of times each of their keys occur. I did this because of your response to my question about what should happen if the last part of each line was something other than ":pass"
(which was "we have to put new count for that key").
Nested dictionaries are often called Tree
data-structures and can be defined recursively — the root is a dictionary as are the branches. The following uses a dict
subclass instead of a plain dict
because it makes constructing them easier since you don't need to special case the creation of the first branch of next level down (except I still do when adding the "leaves" because they are a different subclass, collections.Counter
).
from collections import Counter
from functools import reduce
import re
# (Optional) trick to make Counter subclass print like a regular dict.
class Counter(Counter):
def __repr__(self):
return dict(self).__repr__()
# Borrowed from answer @ https://stackoverflow.com/a/19829714/355230
class Tree(dict):
def __missing__(self, key):
value = self[key] = type(self)()
return value
# Utility functions based on answer @ https://stackoverflow.com/a/14692747/355230
def nested_dict_get(nested_dict, keys):
return reduce(lambda d, k: d[k], keys, nested_dict)
def nested_dict_set(nested_dict, keys, value):
nested_dict_get(nested_dict, keys[:-1])[keys[-1]] = value
def nested_dict_update_count(nested_dict, keys):
counter = nested_dict_get(nested_dict, keys[:-1])
if counter: # Update existing Counter.
counter.update([keys[-1]])
else: # Create a new Counter.
nested_dict_set(nested_dict, keys[:-1], Counter([keys[-1]]))
d = Tree()
pat = re.compile(r'[a-zA-z]+')
with open('abc.txt') as file:
for line in file:
nested_dict_update_count(d, [w for w in pat.findall(line.rstrip())])
print(d) # Prints like a regular dict.
To test the leaf-counting capabilities of the revised code, I used the following test file which includes the same line twice, once ending again with :pass
and another ending in :fail
.
Expanded abc.txt
test file:
abc/pqr/lmn/xyz:pass
abc/pqr/lmn/bcd:pass
abc/pqr/lmn/xyz:fail
abc/pqr/lmn/xyz:pass
Output:
{'abc': {'pqr': {'lmn': {'bcd': {'pass': 1}, 'xyz': {'fail': 1, 'pass': 2}}}}}