0

I have a dataframe

     O   D  counts
0   G1  G1    8576
1   G1  G2    4213
2   G1  G3    8762
3   G2  G1    8476
4   G2  G2    2134
...

But each of the groups have different populations in O and D. So for example:

G1 in O has, say, 1234 different members, while G1 in D has 4321.

How do I normalize the above table using pandas?

Dervin Thunk
  • 19,515
  • 28
  • 127
  • 217

1 Answers1

1

It seems you need reshape first and then normalize:

df = df.set_index(['O','D'])['counts'].unstack(fill_value=0)
print (df)
D     G1    G2    G3
O                   
G1  8576  4213  8762
G2  8476  2134     0

df1 = (df - df.mean()) / (df.max() - df.min())
print (df1)
D    G1   G2   G3
O                
G1  0.5  0.5  0.5
G2 -0.5 -0.5 -0.5

And last reshape back:

print (df1.stack().reset_index(name='count'))

    O   D  count
0  G1  G1    0.5
1  G1  G2    0.5
2  G1  G3    0.5
3  G2  G1   -0.5
4  G2  G2   -0.5
5  G2  G3   -0.5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252