9

Trying to understand how to create nested dictionaries on the fly. Ideally my dictionary would look something like:

mydict = { 'Message 114861156': { 'email': ['user1@domain.com', 'user2@domain.com'] }, { 'status': 'Queued mail for delivery' }} 

Here's what i have so far:

sampledata = "Message 114861156 to user1@domain.com user2@domain.com  [InternalId=260927844] Queued mail for delivery'."

makedict(sampledata)

def makedict(results):
  newdict = {}
  for item in results:
    msgid = re.search(r'Message \d+', item)
    msgid = msgid.group()
    newdict[msgid]['emails'] = re.findall(r'\w+@\w+\.\w+', item)
    newdict[msgid]['status'] = re.findall(r'Queued mail for delivery', item)

has the following output:

Traceback (most recent call last):
  File "wildfires.py", line 57, in <module>
    striptheshit(q_result)
  File "wildfires.py", line 47, in striptheshit
    newdict[msgid]['emails'] = re.findall(r'\w+@\w+\.\w+', item)
KeyError: 'Message 114861156'

How do you make a nested dictionary like this on the fly?

Vincent Savard
  • 34,979
  • 10
  • 68
  • 73
dobbs
  • 1,089
  • 6
  • 22
  • 45
  • 1
    FYI, legal email addresses can match a heck of a lot more patterns than `r'\w+@\w+\.\w+'`. If you're not in a constrained environment (all e-mail addresses are on some corporate domain), that regex is no good. You can [read more here](http://www.regular-expressions.info/email.html) (it includes a "mostly sufficient" regex and an RFC compliant insane regex). – ShadowRanger Feb 19 '16 at 20:40

3 Answers3

11

dict.setdefault is a good tool, so is collections.defaultdict

Your problem right now is that newdict is an empty dictionary, so newdict[msgid] refers to a non-existent key. This works when assigning things (newdict[msgid] = "foo"), however since newdict[msgid] isn't set to anything originally, when you try to index it you get a KeyError.

dict.setdefault lets you sidestep that by initially saying "If msgid exists in newdict, give me its value. If not, set its value to {} and give me that instead.

def makedict(results):
    newdict = {}
    for item in results:
        msgid = re.search(r'Message \d+', item).group()
        newdict.setdefault(msgid, {})['emails'] = ...
        newdict[msgid]['status'] = ...
        # Now you KNOW that newdict[msgid] is there, 'cuz you just created it if not!

Using collections.defaultdict saves you the step of calling dict.setdefault. A defaultdict is initialized with a function to call that produces a container that any non-existent key gets assigned as a value, e.g.

from collections import defaultdict

foo = defaultdict(list)
# foo is now a dictionary object whose every new key is `list()`
foo["bar"].append(1)  # foo["bar"] becomes a list when it's called, so we can append immediately

You can use this to say "Hey if I talk to you about a new msgid, I want it to be a new dictionary.

from collections import defaultdict

def makedict(results):
    newdict = defaultdict(dict)
    for item in results:
        msgid = re.search(r'Message \d+', item).group()
        newdict[msgid]['emails'] = ...
        newdict[msgid]['status'] = ...
Adam Smith
  • 52,157
  • 12
  • 73
  • 112
2

Found what I was looking for in this regard at https://quanttype.net/posts/2016-03-29-defaultdicts-all-the-way-down.html

def fix(f):
    return lambda *args, **kwargs: f(fix(f), *args, **kwargs)

>>> from collections import defaultdict
>>> d = fix(defaultdict)()
>>> d["a"]["b"]["c"]
defaultdict(<function <lambda> at 0x105c4bed8>, {})
sampson
  • 21
  • 1
-1

You need to create newdict[msgid] as an empty dictionary before storing items in it.

newdict[msgid] = {}
John Gordon
  • 29,573
  • 7
  • 33
  • 58