1

I implemented a small helper function that splits a list into a dictionary by a custom key function. The following implementation works, but feels rather un-pythonic:

def classify_list_by_key(l, key_func):
    result = defaultdict(list)

    for item in l:
        result[key_func(item)].append(item)

    return result

I tried using the following dict comprehension, which obviously has horrible performance with a lot of unnecessary calls to key_func and quadratic run-time.

def classify_list_by_key_dict_comprehension(l, key_func):
    return {key_func(x): [y for y in objects if key_func(x) == key_func(y)] for x in l}

Sample in- and output:

str_list = "these are some test strings for testing classify_list_by_key function".split()
print(classify_list_by_key(str_list, len))

produces

defaultdict(<class 'list'>, {5: ['these'], 3: ['are', 'for'], 4: ['some', 'test'], 7: ['strings', 'testing'], 20: ['classify_list_by_key'], 8: ['function']})

Is there a better / more pythonic way to achieve this using built-ins or standard library modules?

riskypenguin
  • 2,139
  • 1
  • 10
  • 22
  • 1
    I've been thinking of different options, but I don't find anything better. Also, the second option, not only does it have O(2) but looks less readable to me. [This](https://stackoverflow.com/a/41165820/9415337) is the only thing which has come up, but at the end of the day you are just hiding and remapping the implementation they did. So I think that there is no more Pythonic way. You can also consider using a standard dict and the setdefault function. – miquelvir Mar 29 '21 at 15:53
  • 1
    Thanks for the pointers, using `setdefault` might actually work well here! I also agree that the dict comprehension does nothing for readability. – riskypenguin Mar 29 '21 at 15:59
  • "The following implementation works, but feels rather un-pythonic:" well, it isn't unpythonic. Why do you *think* it is unpythonic? Why would you not use it? You clearly understand your alternative performs worse? – juanpa.arrivillaga Mar 29 '21 at 16:14

2 Answers2

1

It is possible using groupby from itertools, but it requires a sorted list for that purpose.

Here is the code for that:

from itertools import groupby
str_list = "these are some test strings for testing classify_list_by_key function".split()
str_list.sort(key=len)
res = {k: list(g) for k, g in groupby(str_list, key=len)}
print(res)

Output

{3: ['are', 'for'], 4: ['some', 'test'], 5: ['these'], 7: ['strings', 'testing'], 8: ['function'], 20: ['classify_list_by_key']}
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Cute Panda
  • 1,468
  • 1
  • 6
  • 11
  • Notice the sort is not necessary per se. – miquelvir Mar 29 '21 at 16:02
  • @miquelvir Yeah, you are right, it works without sort too – Cute Panda Mar 29 '21 at 16:03
  • 1
    @miquelvir Modified the code, no need of unnecessary sorting. Thanks for the notice! – Cute Panda Mar 29 '21 at 16:04
  • 3
    The sort **is** necessary, so I rolled it back. To convince yourself, try the input strings `['one', 'four', 'two']`. The `one` won't make it into the result, because it will be in a separate group from the `two`, and the latter list will replace the former. – Karl Knechtel Mar 29 '21 at 16:10
  • @CutePanda, if sorting is needed, I don't think this is any better than the original solution. You need to sort the array AND iterate through it after the groupby. The original solution seems better. – miquelvir Mar 29 '21 at 16:47
  • Comparison-based sorting is O(n lg n), and the built-in Python sort is quite well optimized. That said, practicality beats purity, and there's no need to make everything follow functional style. – Karl Knechtel Mar 30 '21 at 04:32
1

The following is an alternative implementation which does not force to use a defaultdict:

def classify_list_by_key(l, key_func):
    result = {}

    for item in l:
        result.setdefault(key_func(item), []).append(item)

    return result

str_list = "these are some test strings for testing classify_list_by_key function".split()
print(classify_list_by_key(str_list, len))
miquelvir
  • 1,748
  • 1
  • 7
  • 21
  • How is this better? It is certainly not more pythonic to re-invent the wheel – juanpa.arrivillaga Mar 29 '21 at 16:15
  • 1
    I agree that re-inventing the wheel is not more pythonic. As stated, the solution of itertools is the one which I would use. My point is that, if he needs to implement it himself (for some reason), this implementation can be slightly better than the suggested one. – miquelvir Mar 29 '21 at 16:23
  • Why would you use the itertools solutions?? It is *more inefficient* than the OP's solution. – juanpa.arrivillaga Mar 29 '21 at 16:38
  • Well you are right. The itertools solution is actually pretty bad. @Karl Knechtel just explained to us that the sort is necessary, which means that sorting and iterating; it should not be the accepted solution. – miquelvir Mar 29 '21 at 16:46
  • @juanpa.arrivillaga I have fixed my answer. – miquelvir Mar 29 '21 at 16:46