How to group by in pandas based on specific columns where each group has n number of rows? Also delete from the original dataframe IF POSSIBLE?

Question

I have a df as follows-

a   b   c
x   2   3
y   2   3
z   3   2
w   1   5
(upto thousands of records)

I want to group this dataframe based on b,c such that each group has only n number of rows. If there are any more rows in the same group, I want to create a new group. That is the main problem statement. I also want to delete these groups from the original dataframe if possible.

Sample output (with a little more explanation) -

I basically want to loop on the df and am currently using the following code-

for x,y in df.groupby(['b','c']):
    print(y)

With this code Im getting the following groups:
a   b   c
x   2   3
y   2   3

a   b   c
z   3   2

a   b   c
w   1   5

Now lets say I want only 1(n) row in each group, this is the output Im looking for:
a   b   c
x   2   3

a   b   c
y   2   3

a   b   c
z   3   2

a   b   c
w   1   5

(And maybe delete these groups from the df too if possible)

Thank you!

can you give sample output? – MAFiA303 Jul 03 '22 at 10:52 — MAFiA303, Jul 03 '22 at 10:52

score 1 · Accepted Answer · answered Jul 03 '22 at 15:13

Taken from the accepted answer here, I have modified the code for your question:

import pandas as pd

df = pd.DataFrame({"a": ["x", "y", "z", "w"],
                   "b": [2, 2, 3, 1],
                   "c": [3, 3, 2, 5]})

n = 1

for x, y in df.groupby(['b','c']):
    list_df = [y[i: i+n] for i in range(0, y.shape[0], n)]
    for i in list_df:
        print(i)

#a  b  c
#w  1  5
#
#a  b  c
#x  2  3
#
#a  b  c
#y  2  3
#
#a  b  c
#z  3  2

This splits the grouped dataframe for length of n rows. If you wanted to delete each group from the dataframe each time, you could add df.drop(i.index), which will delete the index values (as these are carried through):

for x,y in df.groupby(['b','c']):
    list_df = [y[i: i+n] for i in range(0, y.shape[0], n)]
    for i in list_df:
        print(i)
        df = df.drop(i.index)
        print(df)

Thanks! This code does work, but is it suggested to drop rows from a df while looping on it? — Pallav Doshi, Jul 04 '22 at 12:07
No problem. I would avoid dropping values in a loop if using `.iloc[]`, because results might be different to what you are expecting. If you are using specific, fixed values, then it isn't a problem. — Rawson, Jul 07 '22 at 20:14

How to group by in pandas based on specific columns where each group has n number of rows? Also delete from the original dataframe IF POSSIBLE?

1 Answers1