I have a larger table. Looks like this
df_a =
PRODUCT VERSION COUNTRY
0 ABC 100 AT
1 BCD 110 DE
2 ABC 300 GB
3 BCD 210 AT
4 CEF 700 DE
5 ABC 220 GB
6 ABC 300 DE
7 BCD 210 GB
8 BCD 210 DE
9 BCD 666 DE
10 CEF 777 DE
10 CEF 666 DE
i tryed to get the highest (max) version of a Variable (Product) by an other Variable (Country) and write it in a new Table like this.
COUNTRY(INDEX) ABC BCD CEF
AT 100 210 666
DE 300 110 777
GB 300 210 777
i thought this can't be so difficult and tried to do with groupby and combine the result with reduce.
Module = df.groupby(['PRODUCT', 'COUNTRY'], as_index=False)["VERSION"].max()
ABC = Module[Module["PRODUCT"]== 'ABC']
BCD = Module[Module["PRODUCT"]== 'BCD']
CEF = Module[Module["PRODUCT"]== 'CEF']
data_frames = [ABC, BCD, CEF]
result = reduce(lambda left,right: pd.merge(left,right,on=['COUNTRY'],
how='outer'), data_frames)
Output:
PPRODUCT_x COUNTRY VERSION_x PRODUCT_y VERSION_y PRODUCT_z VERSION_z
0 ABC AT 100.0 BCD 210.0 CEF 666
1 ABC DE 300.0 BCD 110.0 CEF 777
2 ABC GB 300.0 BCD 210.0 CEF 777
My result goes in the right direction, but is not very efficient - especially if the data set Variable 'Product' gets bigger. My biggest problem is currently the handling of the format. But I also want to know if there is another way to get a maximum value of a variable by another variable and transfer it to my preferred table?
Thanks in advance.