0

Here is my question

Let's say I have a dataframe df like this:

                         X          Y                   Class
0                       10        -10                       0
1                       20          3                       0
2                       15          5                       1
3                       29          9                       1
4                       31        -12                       0
5                       14         10                       1
6                       22          3                       0
7                       11          5                       0   

As you can see, 3 rows have value 1 for the 'Class' column, and 5 value 0. I want to have the same number of rows with Class value 0 and Class value 1 (in the example I want to remove 2 rows with value 0 for the 'Class' column randomly) and get output like this:

                         X          Y                   Class
0                       10        -10                       0
1                       15          5                       1
2                       29          9                       1
3                       31        -12                       0
4                       14         10                       1
5                       11          5                       0   

Anyone can help with solving this issue?

Stevan Cakic
  • 33
  • 1
  • 5
  • @YevhenKuzmovych Almost, because when I write: `g = df.groupby('Class') g.apply(lambda x: x.sample(g.size().min()).reset_index(drop=True))` and after that I try this: `X = g.loc[:, g.columns != 'Class'].to_numpy()` I get an error: AttributeError: 'DataFrameGroupBy' object has no attribute 'loc' – Stevan Cakic Jun 16 '21 at 11:35
  • You need to overwrite your variable. `df = ...` or `g = ...` – Erfan Jun 16 '21 at 11:38
  • @Erfan Oh yes, sorry. Thank you :) – Stevan Cakic Jun 16 '21 at 11:40

0 Answers0