3

I have a text file abc.txt:

abc/pqr/lmn/xyz:pass
abc/pqr/lmn/bcd:pass

I need to parse these statements and output should be in nested dictionary as below:

{'abc':{'pqr':{'lmn':{'xyz':{'pass':1},{'bcd':{'pass':1}}}}}}

where 1 is 'pass' count.

I'm able to do as much as this:

import re

d={}
p=re.compile('[a-zA-z]+')
for line in open('abc.txt'):
    for key in p.findall(line):
        d['key']={}
martineau
  • 119,623
  • 25
  • 170
  • 301
sachin27
  • 53
  • 1
  • 1
  • 6

3 Answers3

9

Check out the setdefault method on dictionaries.

d = {}
d.setdefault('pqr', {}).setdefault('lmn', {}).setdefault('xyz', {})['pass'] = 1
d.setdefault('pqr', {}).setdefault('lmn', {}).setdefault('bcd', {})['pass'] = 1
d

gives

{'pqr': {'lmn': {'bcd': {'pass': 1}, 'xyz': {'pass': 1}}}}
Andy
  • 49,085
  • 60
  • 166
  • 233
rongenre
  • 1,334
  • 11
  • 21
5

Here's an updated version of my answer in which leaves of the tree data-structure are now different from those in rest of it. Instead of the tree being strictly a dict-of-nested-dicts, the "leaves" on each branch are now instances of a different subclass of dict named collections.Counter which are useful for counting the number of times each of their keys occur. I did this because of your response to my question about what should happen if the last part of each line was something other than ":pass" (which was "we have to put new count for that key").

Nested dictionaries are often called Tree data-structures and can be defined recursively — the root is a dictionary as are the branches. The following uses a dict subclass instead of a plain dict because it makes constructing them easier since you don't need to special case the creation of the first branch of next level down (except I still do when adding the "leaves" because they are a different subclass, collections.Counter).

from collections import Counter
from functools import reduce
import re


# (Optional) trick to make Counter subclass print like a regular dict.
class Counter(Counter):
    def __repr__(self):
        return dict(self).__repr__()


# Borrowed from answer @ https://stackoverflow.com/a/19829714/355230
class Tree(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value


# Utility functions based on answer @ https://stackoverflow.com/a/14692747/355230
def nested_dict_get(nested_dict, keys):
    return reduce(lambda d, k: d[k], keys, nested_dict)

def nested_dict_set(nested_dict, keys, value):
    nested_dict_get(nested_dict, keys[:-1])[keys[-1]] = value

def nested_dict_update_count(nested_dict, keys):
    counter = nested_dict_get(nested_dict, keys[:-1])
    if counter:  # Update existing Counter.
        counter.update([keys[-1]])
    else:  # Create a new  Counter.
        nested_dict_set(nested_dict, keys[:-1], Counter([keys[-1]]))


d = Tree()
pat = re.compile(r'[a-zA-z]+')
with open('abc.txt') as file:
    for line in file:
        nested_dict_update_count(d, [w for w in pat.findall(line.rstrip())])

print(d)  # Prints like a regular dict.

To test the leaf-counting capabilities of the revised code, I used the following test file which includes the same line twice, once ending again with :pass and another ending in :fail.

Expanded abc.txt test file:

abc/pqr/lmn/xyz:pass
abc/pqr/lmn/bcd:pass
abc/pqr/lmn/xyz:fail
abc/pqr/lmn/xyz:pass

Output:

{'abc': {'pqr': {'lmn': {'bcd': {'pass': 1}, 'xyz': {'fail': 1, 'pass': 2}}}}}
martineau
  • 119,623
  • 25
  • 170
  • 301
0

If i understand your question:

sources = ["abc/pqr/lmn/xyz:pass", "abc/pqr/lmn/bcd:pass", "abc/pqr/lmn/xyz:pass"]


def prepare_source(source):
    path, value = source.split(':')
    elements = path.split('/')
    return elements, value


def add_key(elements, value):
    result = dict()
    if len(elements) > 1:
        result[elements[0]] = add_key(elements[1:], value)

    else:
        result[elements[0]] = {value: 1}

    return result


# base merge function get from here:
# http://stackoverflow.com/questions/7204805/dictionaries-of-dictionaries-merge
def merge(a, b, path=None):
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif isinstance(a[key], int) and isinstance(b[key], int):
                a[key] += b[key]
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a


result = dict()

for source in sources:
    result = merge(result, add_key(*prepare_source(source)))

print result

Output will be:

{'abc': {'pqr': {'lmn': {'bcd': {'pass': 1}, 'xyz': {'pass': 2}}}}}
Sergeev Andrew
  • 463
  • 4
  • 7