0

I have a larger table. Looks like this

    df_a =

     PRODUCT VERSION  COUNTRY
0      ABC     100     AT
1      BCD     110     DE
2      ABC     300     GB
3      BCD     210     AT
4      CEF     700     DE
5      ABC     220     GB
6      ABC     300     DE
7      BCD     210     GB
8      BCD     210     DE
9      BCD     666     DE
10     CEF     777     DE
10     CEF     666     DE

i tryed to get the highest (max) version of a Variable (Product) by an other Variable (Country) and write it in a new Table like this.

COUNTRY(INDEX)  ABC  BCD  CEF 
AT              100  210  666
DE              300  110  777  
GB              300  210  777

i thought this can't be so difficult and tried to do with groupby and combine the result with reduce.

Module = df.groupby(['PRODUCT', 'COUNTRY'], as_index=False)["VERSION"].max()

ABC = Module[Module["PRODUCT"]== 'ABC']
BCD = Module[Module["PRODUCT"]== 'BCD']
CEF = Module[Module["PRODUCT"]== 'CEF']


data_frames = [ABC, BCD, CEF]
result = reduce(lambda  left,right: pd.merge(left,right,on=['COUNTRY'],
                                            how='outer'), data_frames)

Output:

        PPRODUCT_x COUNTRY  VERSION_x PRODUCT_y  VERSION_y PRODUCT_z  VERSION_z 
0          ABC      AT      100.0         BCD      210.0         CEF   666
1          ABC      DE      300.0         BCD      110.0         CEF   777
2          ABC      GB      300.0         BCD      210.0         CEF   777

My result goes in the right direction, but is not very efficient - especially if the data set Variable 'Product' gets bigger. My biggest problem is currently the handling of the format. But I also want to know if there is another way to get a maximum value of a variable by another variable and transfer it to my preferred table?

Thanks in advance.

Dalando
  • 11
  • 2

0 Answers0