How to remove N rows with specific value in column [Pandas]

Question

Here is my question

Let's say I have a dataframe df like this:

                         X          Y                   Class
0                       10        -10                       0
1                       20          3                       0
2                       15          5                       1
3                       29          9                       1
4                       31        -12                       0
5                       14         10                       1
6                       22          3                       0
7                       11          5                       0

As you can see, 3 rows have value 1 for the 'Class' column, and 5 value 0. I want to have the same number of rows with Class value 0 and Class value 1 (in the example I want to remove 2 rows with value 0 for the 'Class' column randomly) and get output like this:

                         X          Y                   Class
0                       10        -10                       0
1                       15          5                       1
2                       29          9                       1
3                       31        -12                       0
4                       14         10                       1
5                       11          5                       0

Anyone can help with solving this issue?

@YevhenKuzmovych Almost, because when I write: `g = df.groupby('Class') g.apply(lambda x: x.sample(g.size().min()).reset_index(drop=True))` and after that I try this: `X = g.loc[:, g.columns != 'Class'].to_numpy()` I get an error: AttributeError: 'DataFrameGroupBy' object has no attribute 'loc' — Stevan Cakic, Jun 16 '21 at 11:35
You need to overwrite your variable. `df = ...` or `g = ...` — Erfan, Jun 16 '21 at 11:38

How to remove N rows with specific value in column [Pandas]

0 Answers0