4

So, I have read quite a few posts on flattening dictionaries recursively in Python. None (save one) have come close to what I'm looking for. First, a quick example of what I am trying to accomplish:

Example dictionary with mixed entries: (keys and values will always be of mixed types)

{'a': [{'b': {'c': 'd', 'e': 'f', 'g': 'h',
              'i': {'j': {'k': ['l'], 'm': 'n'}},
              'o': {'p': {'q': ['r', 's' ], 't': 'u'}}
              }
       }]
}

Desired output:

{'a/b/c/d',
 'a/b/e/f',
 'a/b/g/h',
 'a/b/i/j/k/l',
 'a/b/i/j/m/n',
 'a/b/o/p/q/r',
 'a/b/o/p/q/s',
 'a/b/o/p/t/u'}

The function should (theoretically) work on lists as well.

To explain a bit about what I am doing, I am attempting to search through a Mac plist and other attempts to search by key or value have been shaky at best. To compensate, I want to try a different approach. Convert the dictionary to a list of 'paths' and then just search the paths.

I tried myself (and partially succeeded) and then I found a better solution in the form of this:

def flatten(structure, key="", path="", flattened=None):
    if flattened is None:
        flattened = {}
    if type(structure) not in(dict, list):
        flattened[((path + "/") if path else "") + key] = structure
    elif isinstance(structure, list):
        for i, item in enumerate(structure):
            flatten(item, "", "/".join(filter(None,[path,key])), flattened)
    else:
        for new_key, value in structure.items():
            flatten(value, new_key, "/".join(filter(None,[path,key])), flattened)
    return flattened

This works well but there are a few undesired effects. First, the output is as follows:

{'a/b/c'     : 'd',
 'a/b/e'     : 'f',
 'a/b/g'     : 'h',
 'a/b/i/j/k/': 'l',
 'a/b/i/j/m' : 'n',
 'a/b/o/p/q/': 's',
 'a/b/o/p/t' : 'u'}

This is returning a dictionary of key/value pairs. I'd rather have a list of string paths. Secondly, and more importantly, you'll notice, the script has stripped values where the value was a list. Only appending the last item of the list.

'a/b/o/p/q/': 's' # there should be another entry with 'r' as the value.

I have spent a fair amount of time fiddling with the output and trying to completely wrap my head around the problem to no avail. It may just be my lac of understanding Python, but the output I want should be possible.

I try not to ask questions unless I have run out of options and here I am. Please do not mark as a duplicate, as other questions aren't quite looking to accomplish what I am looking for.

Thank you for your time and assistance/guidance.

kdougan
  • 333
  • 2
  • 10
  • 2
    What should `{ 'q': [ 'r', 's', 'x'],'t': 'u'}` yield? The portion of your example that generates `'a/b/c/d/o/p/q/r', 'a/b/c/d/o/p/q/r/s'` looks to me like it should be `.../q/r, .../q/s`. – Matt Bryant Oct 31 '14 at 18:37
  • Thank you Matt. You are correct. ('.../q/r', '.../q/s', '.../q/x', '.../t/u') I created the desired output without really thinking. I have edited my question. – kdougan Oct 31 '14 at 18:56
  • Perhaps [flatten nested Python dictionaries, compressing keys](http://stackoverflow.com/questions/6027558/flatten-nested-python-dictionaries-compressing-keys) don't be the answer, but perhaps it could helps. – Mauro Baraldi Oct 31 '14 at 19:10
  • Thank you Mauro for the link. I'll take a look! – kdougan Oct 31 '14 at 19:22

2 Answers2

2

Python 2.7:

def flatten(structure):
    if isinstance(structure, basestring):
        return [structure]
    ret = []
    if isinstance(structure, list):
        for v in structure:
            ret.extend(flatten(v))
    elif isinstance(structure, dict):
        for k, v in structure.items():
            ret.extend(k + '/' + f for f in flatten(v))
    return ret

print sorted(flatten(structure))

Output:

['a/b/c/d', 'a/b/e/f', 'a/b/g/h', 'a/b/i/j/k/l', 'a/b/i/j/m/n', 'a/b/o/p/q/r', 'a/b/o/p/q/s', 'a/b/o/p/t/u']

Or, if you don't care about the order, you can simply print flatten(structure).

irrelephant
  • 4,091
  • 2
  • 25
  • 41
  • Thank you irrelephant! This works perfectly. Strange how it is always the simple solutions. Well, that, combined with my own tendency to overcomplicate. – kdougan Oct 31 '14 at 20:31
1

Here's how I would do it in Python 3.3+:

def flatten(exp):
    def sub(exp, res):
        if type(exp) == dict:
            for k, v in exp.items():
                yield from sub(v, res+[k])
        elif type(exp) == list:
            for v in exp:
                yield from sub(v, res)
        else:
            yield "/".join(res+[exp])
    yield from sub(exp, [])

testing:

l={'a': [{'b': {'c': 'd', 'e': 'f', 'g': 'h',
              'i': {'j': {'k': ['l'], 'm': 'n'}},
              'o': {'p': {'q': ['r', 's' ], 't': 'u'}}
              }
       }]
}

for i in sorted(flatten(l)):
    print(i)

yields

a/b/c/d
a/b/e/f
a/b/g/h
a/b/i/j/k/l
a/b/i/j/m/n
a/b/o/p/q/r
a/b/o/p/q/s
a/b/o/p/t/u

EDIT translation to Python 2 is trivial:

def flatten(exp):
    def sub(exp, res):
        if type(exp) == dict:
            for k, v in exp.items():
                for r in sub(v, res+[k]):
                    yield r
        elif type(exp) == list:
            for v in exp:
                for r in sub(v, res):
                    yield r
        else:
            yield "/".join(res+[exp])
    for r in sub(exp, []):
        yield r

then

>>> for i in sorted(flatten(l)):
...     print i
...
a/b/c/d
a/b/e/f
a/b/g/h
a/b/i/j/k/l
a/b/i/j/m/n
a/b/o/p/q/r
a/b/o/p/q/s
a/b/o/p/t/u
uselpa
  • 18,732
  • 2
  • 34
  • 52
  • 1
    Thank you uselpa. I'm currently using 2.7, so I can't use the solution but I will keep this in mind if/when I upgrade. – kdougan Oct 31 '14 at 20:41
  • @kdougan No problem. Please specify your Python version in your next question. FWIW, I added a translation to Python 2.7. – uselpa Oct 31 '14 at 21:31
  • Noted. Thanks for the translation as well. – kdougan Nov 01 '14 at 01:14