Pandas subset of DataFrame with fixed number of value per class

Question

I have a pandas DataFrame df that contains a column ID and a column Type. Here is an example :

print(df)

>>
+---------+---------+
|       ID|     Type|
+---------+---------+
|      AAA|        A|
|      BBB|        B|
|      CCC|        B|
|      DDD|        A|
|      EEE|        B|
|      FFF|        A|
|      GGG|        B|
+---------+---------+

From that DataFrame, I want to extract a sub DataFrame with X distinct values for each Type.

Here is with the previous example (The order does not matter):

X = 2
new_df = do_something(df, X)
print(new_df)

>>
+---------+---------+
|       ID|     Type|
+---------+---------+
|      AAA|        A|
|      DDD|        A|
|      BBB|        B|
|      CCC|        B|
+---------+---------+

Is there a simple way to do that ?

score 2 · Accepted Answer · answered May 22 '19 at 08:29

2

Use pandas.DataFrame.groupby.head:

import pandas as pd

df.groupby('Type').head(2)

Output:

    ID Type
0  AAA    A
1  BBB    B
2  CCC    B
3  DDD    A

answered May 22 '19 at 08:29

Chris

29,127
3
28
51

Pandas subset of DataFrame with fixed number of value per class

1 Answers1