Simple way to group items into buckets

Question

I often want to bucket an unordered collection in python. itertools.groubpy does the right sort of thing but almost always requires massaging to sort the items first and catch the iterators before they're consumed.

Is there any quick way to get this behavior, either through a standard python module or a simple python idiom?

>>> bucket('thequickbrownfoxjumpsoverthelazydog', lambda x: x in 'aeiou')
{False: ['t', 'h', 'q', 'c', 'k', 'b', 'r', 'w', 'n', 'f', 'x', 'j', 'm', 'p',
    's', 'v', 'r', 't', 'h', 'l', 'z', 'y', 'd', 'g'],
 True: ['e', 'u', 'i', 'o', 'o', 'u', 'o', 'e', 'e', 'a', 'o']}
>>> bucket(xrange(21), lambda x: x % 10)
{0: [0, 10, 20],
 1: [1, 11],
 2: [2, 12],
 3: [3, 13],
 4: [4, 14],
 5: [5, 15],
 6: [6, 16],
 7: [7, 17],
 8: [8, 18],
 9: [9, 19]}

score 23 · Accepted Answer · edited May 23 '17 at 11:53

This has come up several times before -- (1), (2), (3) -- and there's a partition recipe in the itertools recipes, but to my knowledge there's nothing in the standard library.. although I was surprised a few weeks ago by accumulate, so who knows what's lurking there these days? :^)

When I need this behaviour, I use

from collections import defaultdict

def partition(seq, key):
    d = defaultdict(list)
    for x in seq:
        d[key(x)].append(x)
    return d

and get on with my day.

grieve · Answer 2 · 2012-10-04T04:52:06.833

6

Here is a simple two liner

d = {}
for x in "thequickbrownfoxjumpsoverthelazydog": d.setdefault(x in 'aeiou', []).append(x)

Edit:

Just adding your other case for completeness.

d={}
for x in xrange(21): d.setdefault(x%10, []).append(x)

edited Oct 04 '12 at 04:52

answered Oct 04 '12 at 04:33

grieve

13,220
10
49
61

I always find `defaultdict` to be nicer than d.setdefault.etc type tricks – wim Oct 04 '12 at 04:58
@wim: Yeah, I always forget that it exists. That is why I up-voted DSM's answer. – grieve Oct 04 '12 at 05:00
2

wim: actually, I like `setdefault` better than `defaultdict`. It's pretty much the same amount of code either way, but `setdefault` is explicit about it, you can use it on existing dicts as you find you need it. – Mu Mind Oct 05 '12 at 08:34

score 2 · Answer 3 · answered Jan 26 '15 at 12:16

Here's a variant of partition() from above when the predicate is boolean, avoiding the cost of a dict/defaultdict:

def boolpartition(seq, pred):
    passing, failing = [], []
    for item in seq:
        (passing if pred(item) else failing).append(item)
    return passing, failing

Example usage:

>>> even, odd = boolpartition([1, 2, 3, 4, 5], lambda x: x % 2 == 0)
>>> even
[2, 4]
>>> odd
[1, 3, 5]

Boern · Answer 4 · 2017-11-09T21:20:01.683

If its a pandas.DataFrame the following also works, utilizing pd.cut()

from sklearn import datasets
import pandas as pd

# import some data to play with
iris = datasets.load_iris()
df_data = pd.DataFrame(iris.data[:,0])  # we'll just take the first feature

# bucketize
n_bins = 5
feature_name = iris.feature_names[0].replace(" ", "_")
my_labels = [str(feature_name) + "_" + str(num) for num in range(0,n_bins)]
pd.cut(df_data[0], bins=n_bins, labels=my_labels)

yielding

0      0_1
1      0_0
2      0_0
[...]

In case you don't set the labels, the output is going to like this

0       (5.02, 5.74]
1      (4.296, 5.02]
2      (4.296, 5.02]
[...]

korylprince · Answer 5 · 2012-10-04T05:23:31.710

-1

Edit:

Using DSM's answer as a start, here is a slightly more concise, general answer:

d = defaultdict(list)
map(lambda x: d[x in 'aeiou'].append(x),'thequickbrownfoxjumpsoverthelazydog')

or

d = defaultdict(list)
map(lambda x: d[x %10].append(x),xrange(21))

#

Here is a two liner:

d = {False:[],True:[]}
filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")

Which can of course be made a one-liner:

d = {False:[],True:[]};filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")

edited Oct 04 '12 at 05:23

answered Oct 04 '12 at 04:44

korylprince

2,969
1
18
27

-1 The keys aren't always False and True, they should be the outputs of the callable – wim Oct 04 '12 at 04:49
I'd recommend the `for x in 'thequickbrownfoxjumpsoverthelazydog': d[x in 'aeiou'].append(x)` form (as in grieve's answer). I'm not really a fan of using `map` for side effects and throwing away the value. – Mu Mind Oct 04 '12 at 05:35
The problem with grieves answer, though it works well, is that it costs a bit more computationally. (it calls setdefault for every element, repeated or not.) I like DSMs answer, but I wanted to see if I could get a one liner (or close to it.) – korylprince Oct 04 '12 at 05:46
2

it's not really a one-liner if you're just joining two lines together with `;` ! – wim Oct 04 '12 at 06:04
1

I wasn't saying to not use `defaultdict`, I meant that changing a for loop to `map` doesn't make it more concise, it just makes it more confusing. – Mu Mind Oct 04 '12 at 07:54
@wim, yes true. It does do all of the computing in one line. Mu Mind, that really depends on your perspective. Using map is a more functional style. Eh. To each his own. – korylprince Oct 04 '12 at 14:29

Simple way to group items into buckets

5 Answers5

Linked