0

I have a pandas DataFrame df that contains a column ID and a column Type. Here is an example :

print(df)

>>
+---------+---------+
|       ID|     Type|
+---------+---------+
|      AAA|        A|
|      BBB|        B|
|      CCC|        B|
|      DDD|        A|
|      EEE|        B|
|      FFF|        A|
|      GGG|        B|
+---------+---------+

From that DataFrame, I want to extract a sub DataFrame with X distinct values for each Type.

Here is with the previous example (The order does not matter):

X = 2
new_df = do_something(df, X)
print(new_df)

>>
+---------+---------+
|       ID|     Type|
+---------+---------+
|      AAA|        A|
|      DDD|        A|
|      BBB|        B|
|      CCC|        B|
+---------+---------+

Is there a simple way to do that ?

Nakeuh
  • 1,757
  • 3
  • 26
  • 65

1 Answers1

2

Use pandas.DataFrame.groupby.head:

import pandas as pd

df.groupby('Type').head(2)

Output:

    ID Type
0  AAA    A
1  BBB    B
2  CCC    B
3  DDD    A
Chris
  • 29,127
  • 3
  • 28
  • 51