1

I am trying to figure out how to use pandas groupby.

My dataframe looks like this

index ID Code Quantity
0    400 ZU   5000
1    400 ZO   12
2    401 ZU   9000
3    401 ZO   15
4    402 ZU   1000
5    403 ZO   15

I'd like to groupby ID and Code and create an extra column which devides the Quantity of ZU by the quantity of ZO, but only if the groupby has an entry for both. If one is missing the data can be dropped.

I know I can groupby multiple columns like:

data.groupby(['ID', 'Code'])

But I've no idea how to proceed from there, since I don't need the .sum() or the .mean() valuze.

The expected output could be something like this:

index ID value
0    400 416.17
1    401 600
jpp
  • 159,742
  • 34
  • 281
  • 339
Simmen
  • 93
  • 2
  • 8

2 Answers2

0

Strictly speaking, you don't need groupby here. You can filter your dataframe and use set_index to create two mappings. Then divide one by the other and remove null values.

zu = df.loc[df['Code'] == 'ZU'].set_index('ID')['Quantity']
zo = df.loc[df['Code'] == 'ZO'].set_index('ID')['Quantity']

res = (zu / zo).dropna().reset_index()

print(res)

    ID    Quantity
0  400  416.666667
1  401  600.000000
jpp
  • 159,742
  • 34
  • 281
  • 339
0

Use pivot with dropna, then divide by div:

df1 = df.pivot('ID', 'Code', 'Quantity').dropna()
print (df1)
Code    ZO      ZU
ID                
400   12.0  5000.0
401   15.0  9000.0

df2 = df1['ZU'].div(df1['ZO']).reset_index(name='val')
print (df2)
    ID         val
0  400  416.666667
1  401  600.000000
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252