0

I'd like to group emails by their domain and convert the result into a dictionary. So far I have figured out that itertools.groupby with a custom func will do that. It correctly assigns keys to each value, but when I try to create a dictionary only the last value is used when the values to be grouped are not continues.


import re
from itertools import groupby

{k: list(v) for k, v in groupby(["bar", "foo", "baz"], key=lambda x: "to" if re.search(r"^b", x) else "cc")}

This will produce {'to': ['baz'], 'cc': ['foo']} instead of {'to': ['bar', 'baz'], 'cc': ['foo']}.

How I can fix that?

t3chb0t
  • 16,340
  • 13
  • 78
  • 118

2 Answers2

3

Sort the group first to get correct result (itertools.groupby groups continuous items):

import re
from itertools import groupby

out = {
    k: list(v)
    for k, v in groupby(
        sorted(
            ["awol", "bar", "foo", "baz"],
            key=lambda x: bool(re.search(r"^b", x)),
        ),
        key=lambda x: "to" if re.search(r"^b", x) else "cc",
    )
}

print(out)

Prints:

{'cc': ['awol', 'foo'], 'to': ['bar', 'baz']}
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • ["awol" went awol](https://tio.run/##PU7LCoMwELznK5Y9JSC99FKEfklpIWqsweiGTbTa0m@3UdvOZRjmwfg5NtQfT56XxXaeOAIbUTN1YKPhSOQCfI070@CLWQgaIpzhJSChzcHZEOWoNlkTQ5vBCLb/5eVmrAhpxVTygvpBDjPAQvNKNdGunnhV2T/emvnsdFdUGqYcMBKCrdO9QzCay0Yy3orUmxQYFwxgWeJeVuIthGfbR5muqmX5AA)... – Kelly Bundy Aug 26 '22 at 19:50
  • @KellyBundy Good catch, the `key=` parameter in sort needs to be similar than the function in `.groupby` (edited). – Andrej Kesely Aug 26 '22 at 19:53
  • (And that's why the doc says *"it is usually necessary to have sorted the data using the same key function"*) – Kelly Bundy Aug 26 '22 at 19:53
1

You can use dict.setdefault OR collections.defaultdict(list) and extend in list like below.

# from collections import defaultdict
# dct = defaultdict(list)

from itertools import groupby
import re

dct = {}
for k, v in groupby(["awol", "bar", "foo", "baz"], 
                    key=lambda x: "to" if re.search(r"^b", x) else "cc"):
    dct.setdefault(k,[]).extend(list(v))

    # If you use 'dct = defaultdict(list)'. You can add item in 'list' like below
    # dct[k].extend(list(v))
print(dct)

{'cc': ['awol', 'foo'], 'to': ['bar', 'baz']}
I'mahdi
  • 23,382
  • 5
  • 22
  • 30
  • @t3chb0t Python added `groupby` in 2003. What was the situation like back then? How many other programming languages even had such grouping functionality back then? – Kelly Bundy Aug 26 '22 at 20:11
  • @AndrejKesely there ware databases that already have established what to expect from a `groupby`. They should have named this one `group_consecutive` or make it at least an option so that one doesn't have to wonder WTF. I googled and I see that already 10 years ago people were wondering about its mechanics https://stackoverflow.com/questions/8116666/itertools-groupby-not-grouping-correctly – t3chb0t Aug 26 '22 at 20:14