-1

I am having a DataFrame with a bunch of companies, each company is assigned to one of five groups (ESG Bottom, ESG 4, ESG 3, ESG 2 or ESG Top). enter image description here

Now I need to compute the weight of each company within its group in order to then compute their weighted group return and hence the overall group return.

I have managed to use groupby to get each groups summed market capital (MC) enter image description here and each companies individual MC. enter image description here

My question is, how can I now divide the companies' individual MC by its respective total group MC and save the resulting weight in the "Weight in Quintile PF" column?

I could somehow manage to do this with a clumsy and slow for-loop but there must be a more elegant one (or two) liner which divides the inividual company MC with the respective total group MC.

9Morgan8
  • 21
  • 6
  • 1
    kindly provide a sample dataframe, with expected output. The answer to that should guide you in your actual data – sammywemmy Jun 17 '23 at 05:42
  • Is there a best practice on how to upload the data for download or just provide some pd.DataFrame(data) example? – 9Morgan8 Jun 17 '23 at 05:50
  • 2
    [this page](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) should guide you – sammywemmy Jun 17 '23 at 05:52
  • I made a sample of 10 rows for replication but failing to make `.to_dict()` print out the whole sample df. Although I now spent about 20 minutes searching for a fix I could not find one. `df_sample.to_dict()` only returns one row which is rather useless for replicatioon. Is there a way to circumvent this restriction? I tried `pd.set_option('display.max_columns', None), pd.set_option('display.width', None) pd.set_option('display.max_colwidth', None)` but those did not heal the issue that only one row of the df_sample is displayed. – 9Morgan8 Jun 17 '23 at 06:36
  • 1
    trim your dataframe to just the columns needed, or a small chunk (say quintile PF, MC). then something like `df.loc[10].to_dict()` should be fine – sammywemmy Jun 17 '23 at 09:25
  • @sammywemmy Would this be enough? Furthermore what is the reason for this restriction in `.to_dict()` ? – 9Morgan8 Jun 19 '23 at 13:36

1 Answers1

0

I figured out a way using a clumsy for loop using mask:

    for label in labels:
       mask = df_single_MC['Quintile PF']==label
       df_single_MC['Weight in Quintile PF'][mask] = df_single_MC['MC'][mask].values / quintile_MCs[label]

It is not pretty but it solves my problem, the second part of this answer helped me.

I still would be curious if there is a neater solution without a for loop using e.g. apply and or lambda since I still have not fully figured them out.

9Morgan8
  • 21
  • 6