Randomly chunk variables to groups of a certain number

Question

I have a large pandas dataframe in which I am attempting to randomly chunk objects into groups of a certain number. For example, I am attempting to chunk the below objects into groups of 3. However, groups must be from the same type. Here's a toy dataset:

type     object       index

ball     soccer       1
ball     soccer       2
ball     basket       1
ball     bouncy       1
ball     tennis       1
ball     tennis       2
chair    office       1
chair    office       2
chair    office       3
chair    lounge       1
chair    dining       1
chair    dining       2
...      ...          ...

Desired output:

type     object       index    group

ball     soccer       1        ball_1
ball     soccer       2        ball_1
ball     basket       1        ball_1
ball     bouncy       1        ball_1
ball     tennis       1        ball_2
ball     tennis       2        ball_2
chair    office       1        chair_1
chair    office       2        chair_1
chair    office       3        chair_1
chair    lounge       1        chair_1
chair    dining       1        chair_1
chair    dining       2        chair_1
...      ...          ...      ...

So here, the group ball_1 contains 3 unique objects from the same type: soccer, basket, and bouncy. The remainder object goes into group ball_2 which only has 1 object. Since the dataframe is so large, I'm hoping for a long list of groups that contain 3 objects and one group that contains the remainder objects (anything less than 3).

Again, while my example only contains a few objects, I'm hoping for the objects to be randomly sorted into groups of 3. (My real dataset will contain many more balls and chairs.)

This seemed helpful, but I haven't figured out how to apply it yet: How do you split a list into evenly sized chunks?

score 0 · Accepted Answer · answered Aug 05 '20 at 06:12

If need split for each N values per groups by type is possible use factorize with GroupBy.transform, integer divide and add 1, last add column type in Series.str.cat:

N = 3
g = df.groupby('type')['object'].transform(lambda x: pd.factorize(x)[0]) // N + 1

df['group'] = df['type'].str.cat(g.astype(str), '_')
print (df)
     type  object  index    group
0    ball  soccer      1   ball_1
1    ball  soccer      2   ball_1
2    ball  basket      1   ball_1
3    ball  bouncy      1   ball_1
4    ball  tennis      1   ball_2
5    ball  tennis      2   ball_2
6   chair  office      1  chair_1
7   chair  office      2  chair_1
8   chair  office      3  chair_1
9   chair  lounge      1  chair_1
10  chair  dining      1  chair_1

If need also some randomize values add DataFrame.sample:

N = 3
df = df.sample(frac=1)
g = df.groupby('type')['object'].transform(lambda x: pd.factorize(x)[0]) // N + 1

df['group'] = df['type'].str.cat(g.astype(str), '_')
print (df)
     type  object  index    group
10  chair  dining      1  chair_1
8   chair  office      3  chair_1
2    ball  basket      1   ball_1
1    ball  soccer      2   ball_1
7   chair  office      2  chair_1
0    ball  soccer      1   ball_1
9   chair  lounge      1  chair_1
4    ball  tennis      1   ball_1
6   chair  office      1  chair_1
3    ball  bouncy      1   ball_2
5    ball  tennis      2   ball_1

Randomly chunk variables to groups of a certain number

1 Answers1