0

I'm working with the enron email data set. It's a dictionary of dictionaries, where each name in the original dictionary is a key for a another set of features. Just to give an idea it would look something like this.

enron = {'Mark' : {'salary': 10, 'employed': 'yes'}, 'Ted' : {'salary': 5, 'employed': 'yes'}

Except the real data set is of course much larger with many more features. If I want to get a list of the features, I do something like:

for key in enron['Mark']:
    print key

This works fine enough but seems kind of lazy. Is there a more generic function in Python that can automatically reach to a certain layer of dictionaries? I'm just afraid I might one day have to work with a multi-level dictionary, and I'd rather not have to write variations of:

for key in dic['a']['b']['c']

over and over again.

Nicholas Hassan
  • 949
  • 2
  • 10
  • 27
  • 2
    What should your expected result be? `[Mark, salary, employed, Ted, salary, employed]`, or just `[salary, employed, salary, employed]`, or something like `[Mark-salary, Mark-employed, Ted-salary, Ted-employed]`? – tobias_k Mar 23 '17 at 17:08
  • 1
    "kind of lazy"? Who/what is lazy here? – Scott Hunter Mar 23 '17 at 17:11
  • 1
    If you have a multi-level dictionary, *somebody* has to write `dic['a']['b']['c']`. – Scott Hunter Mar 23 '17 at 17:12
  • Python has no way of knowing the relationship between the dictionaries that are values of another dictionary: for all it knows, they could have a completely different structure. So no, there is not a "generic function" that will go to a certain layer, though something like @tobias_k's suggestion would be pretty simple. – brianpck Mar 23 '17 at 17:13
  • 1
    Maybe XPATH like queries in dictionaries are way to go. Take a look on dpath package: https://pypi.python.org/pypi/dpath – Patrik Polakovic Mar 23 '17 at 17:21
  • @tobias_k In this scenario I would want just `salary employed`, but I'm sure that any method to get that could be adapted to get those similar outputs. Of course, the list of features is much longer – Nicholas Hassan Mar 23 '17 at 17:25
  • @ScottHunter I guess lazy was a poor choice of words, inefficient is better. At the moment I print the keys in the dictionary, choose a random key, and then print the keys again from dictionary[key]. Especially if I want to automate this, for different dictionaries, I can't guarantee that I could randomly choose enron['Mark'] and have it be applicable – Nicholas Hassan Mar 23 '17 at 17:27
  • possibly related or helpful [Find all occurences of a key in nested python dictionaries and lists](http://stackoverflow.com/questions/9807634/find-all-occurences-of-a-key-in-nested-python-dictionaries-and-lists) – chickity china chinese chicken Mar 23 '17 at 17:29
  • You can probably use print `enron.popitem()[1].keys()` The number of times you use pop items depends on the depth of dict. – Keerthana Prabhakaran Mar 23 '17 at 17:46

3 Answers3

1

Is this similar to what you wanted?

enron = {'Mark': {'salary': 10, 'employed': {'boogie': 'obviously'}}, 
        'Ted': {'salary': 5, 'employed': 'yes'}}


def get_nested_keys(dictionary, dict_keys):
    return list(recursive_nested_keys(dictionary, dict_keys))


def recursive_nested_keys(dictionary, dict_keys):
    if len(dict_keys) < 2:
        return dictionary[dict_keys[0]].keys()
    if len(dict_keys) > 1:
        return recursive_nested_keys(dictionary[dict_keys[0]], dict_keys[1:])

print(get_nested_keys(enron, ('Mark',)))
print(get_nested_keys(enron, ('Mark','employed')))

That prints:

['employed', 'salary']
['boogie']
BoobyTrap
  • 967
  • 7
  • 18
1

'NestedDict' allows you to get the keys of a nested dictionary as tuples, using the same syntax that you would use for dictionaries.

First install ndicts

pip install ndicts

Then

from ndicts.ndicts import NestedDict

enron = {'Mark' : {'salary': 10, 'employed': 'yes'}, 'Ted' : {'salary': 5, 'employed': 'yes'}}

nd = NestedDict(enron)
keys = list(nd.keys())

Keys is a list of tuples now

>>> keys
[('Mark', 'salary'), ('Mark', 'employed'), ('Ted', 'salary'), ('Ted', 'employed')]

You can access any level with a list comprehension

>>> [key[0] for key in keys]
['Mark', 'Mark', 'Ted', 'Ted']
>>> [key[1] for key in keys]
['salary', 'employed', 'salary', 'employed']

If your nested dictionary has varying depth you may incur into an IndexError. You can avoid that simply by adding a condition in the comprehension

>>> [key[2] for key in keys if len(key) > 2]
[]
>>> # Empty because the maxium depth of enron is 2, but no exception
edd313
  • 1,109
  • 7
  • 20
0

Have a try with this library addict https://github.com/mewwts/addict.

You can just write dic.a.b.c instead of dic["a"]["b"]["c"], if this is lazy way you are asking for.

Chuancong Gao
  • 654
  • 5
  • 7