0

Suppose I have this dataframe:

df = pd.DataFrame(
    [[1,2.2,3.1],[1,1.5,4.2],[1,3.6,7.0],
     [2,1.9,9.8],[2,3.0,7.1],
     [3,1.1,2.2],[3,4.4,5.6],[3,2.1,1.2]],
     columns=['id', 'A', 'B'])

df
    id   A      B
0   1   2.2     3.1
1   1   1.5     4.2
2   1   3.6     7.0
3   2   1.9     9.8
4   2   3.0     7.1
5   3   1.1     2.2
6   3   4.4     5.6
7   3   2.1     1.2

Then I need a way to transverse this dataframe performing some summary calculation on collection of rows by unique id.

Something similar to:

df1, df2, df3 = df.groupby('id')

But it is not possible to do it this way since:

  1. there are over 1000 of unique ids,
  2. the return value of the above would be a tuple, and a dataframe is required.

Expected out:

df1
    id   A      B
0   1   2.2     3.1
1   1   1.5     4.2
2   1   3.6     7.0

df2
   id    A       B
3   2   1.9     9.8
4   2   3.0     7.1

df3
    id   A      B
5   3   1.1     2.2
6   3   4.4     5.6
7   3   2.1     1.2

How do I do this?

1 Answers1

0

You can try dataframe.groupby with exec

df = [x for _, x in df.groupby('id')]

for i in range(len(df)):
    exec(f'df{i} = df[i].reset_index(drop=True)')

Output

df0
   id   A    B
0   1   2.2  3.1
1   1   1.5  4.2
2   1   3.6  7.0
df1
    id  A    B
0   2   1.9  9.8
1   2   3.0  7.1
df2
    id  A    B
0   3   1.1  2.2
1   3   4.4  5.6
2   3   2.1  1.2
Pawan Jain
  • 815
  • 3
  • 15