Splitting list into a dictionary of lists by a custom key function

Question

I implemented a small helper function that splits a list into a dictionary by a custom key function. The following implementation works, but feels rather un-pythonic:

def classify_list_by_key(l, key_func):
    result = defaultdict(list)

    for item in l:
        result[key_func(item)].append(item)

    return result

I tried using the following dict comprehension, which obviously has horrible performance with a lot of unnecessary calls to key_func and quadratic run-time.

def classify_list_by_key_dict_comprehension(l, key_func):
    return {key_func(x): [y for y in objects if key_func(x) == key_func(y)] for x in l}

Sample in- and output:

str_list = "these are some test strings for testing classify_list_by_key function".split()
print(classify_list_by_key(str_list, len))

produces

defaultdict(<class 'list'>, {5: ['these'], 3: ['are', 'for'], 4: ['some', 'test'], 7: ['strings', 'testing'], 20: ['classify_list_by_key'], 8: ['function']})

Is there a better / more pythonic way to achieve this using built-ins or standard library modules?

I've been thinking of different options, but I don't find anything better. Also, the second option, not only does it have O(2) but looks less readable to me. [This](https://stackoverflow.com/a/41165820/9415337) is the only thing which has come up, but at the end of the day you are just hiding and remapping the implementation they did. So I think that there is no more Pythonic way. You can also consider using a standard dict and the setdefault function. — miquelvir, Mar 29 '21 at 15:53
Thanks for the pointers, using `setdefault` might actually work well here! I also agree that the dict comprehension does nothing for readability. — riskypenguin, Mar 29 '21 at 15:59
"The following implementation works, but feels rather un-pythonic:" well, it isn't unpythonic. Why do you *think* it is unpythonic? Why would you not use it? You clearly understand your alternative performs worse? — juanpa.arrivillaga, Mar 29 '21 at 16:14

score 1 · Answer 1 · edited Mar 29 '21 at 16:09

1

It is possible using groupby from itertools, but it requires a sorted list for that purpose.

Here is the code for that:

from itertools import groupby
str_list = "these are some test strings for testing classify_list_by_key function".split()
str_list.sort(key=len)
res = {k: list(g) for k, g in groupby(str_list, key=len)}
print(res)

Output

{3: ['are', 'for'], 4: ['some', 'test'], 5: ['these'], 7: ['strings', 'testing'], 8: ['function'], 20: ['classify_list_by_key']}

edited Mar 29 '21 at 16:09

Karl Knechtel

62,466
11
102
153

answered Mar 29 '21 at 15:54

Cute Panda

1,468
1
6
11

Notice the sort is not necessary per se. – miquelvir Mar 29 '21 at 16:02
@miquelvir Yeah, you are right, it works without sort too – Cute Panda Mar 29 '21 at 16:03
1

@miquelvir Modified the code, no need of unnecessary sorting. Thanks for the notice! – Cute Panda Mar 29 '21 at 16:04
3

The sort **is** necessary, so I rolled it back. To convince yourself, try the input strings `['one', 'four', 'two']`. The `one` won't make it into the result, because it will be in a separate group from the `two`, and the latter list will replace the former. – Karl Knechtel Mar 29 '21 at 16:10
@CutePanda, if sorting is needed, I don't think this is any better than the original solution. You need to sort the array AND iterate through it after the groupby. The original solution seems better. – miquelvir Mar 29 '21 at 16:47
Comparison-based sorting is O(n lg n), and the built-in Python sort is quite well optimized. That said, practicality beats purity, and there's no need to make everything follow functional style. – Karl Knechtel Mar 30 '21 at 04:32

miquelvir · Answer 2 · 2021-03-29T16:46:18.723

1

The following is an alternative implementation which does not force to use a defaultdict:

def classify_list_by_key(l, key_func):
    result = {}

    for item in l:
        result.setdefault(key_func(item), []).append(item)

    return result

str_list = "these are some test strings for testing classify_list_by_key function".split()
print(classify_list_by_key(str_list, len))

edited Mar 29 '21 at 16:46

answered Mar 29 '21 at 16:05

miquelvir

1,748
1
7
21

How is this better? It is certainly not more pythonic to re-invent the wheel – juanpa.arrivillaga Mar 29 '21 at 16:15
1

I agree that re-inventing the wheel is not more pythonic. As stated, the solution of itertools is the one which I would use. My point is that, if he needs to implement it himself (for some reason), this implementation can be slightly better than the suggested one. – miquelvir Mar 29 '21 at 16:23
Why would you use the itertools solutions?? It is *more inefficient* than the OP's solution. – juanpa.arrivillaga Mar 29 '21 at 16:38
Well you are right. The itertools solution is actually pretty bad. @Karl Knechtel just explained to us that the sort is necessary, which means that sorting and iterating; it should not be the accepted solution. – miquelvir Mar 29 '21 at 16:46
@juanpa.arrivillaga I have fixed my answer. – miquelvir Mar 29 '21 at 16:46

Splitting list into a dictionary of lists by a custom key function

2 Answers2