How can I create one data frame if I have multiple filters?

Question

I have a df:

df = pd.DataFrame({'Company': {0: 'KPMG', 1: 'Google', 2: 'LIC', 3: 'ABC', 4: 'Apple'},
 'Sector': {0: 'Finance', 1: 'Tech', 2: 'Finance', 3: 'Finance', 4: 'Tech'},
 'Cap': {0: 100, 1: 200, 2: 100, 3: 100, 4: 300}})

I need to calculate the ratio(new column that needs to be added) of cap per company depending on the 'Sector'.

For example: The total Cap for Finance Sector is 300 so that 'Ratio' for KPMG would be 100/300. Similarly I have to do for Tech. There might be new Company and Sectors that are added later so it has to be configurable that is why I am using the list format.

This is what I tried:

sector_list =['Finance','Tech']

for i in sector_list:
    total_cap_per_sector = df.loc[df['Sector'] == i, 'Cap'].sum()
    ratios = df.loc[df['Sector'] == i, 'Cap']/total_cap_per_sector
df['Ratio'] = calc
df.to_csv('new_df.csv')

The new_df looks like:

I get it that because it is a for loop it is only taking the last loop values so I am not getting the blanks for the Finance Sector.

I want to understand how can I get a df with the 'Ratio' values populated for all the sectors mentioned in the sector_list?

For the above example the end result should look like this:

in future, kindly do `df.to_dict()` and include your data in your question as code as I have also done in my answer.. I have edited your question for you! Please also no images. Kindly see: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples for more information on how to make a good `pandas` example that is reproducible. — David Erickson, Jan 12 '21 at 21:05

score 1 · Accepted Answer · answered Jan 12 '21 at 21:02

groupby and transform will do

df['ratio']=df.groupby('Sector')['Cap'].transform(lambda x:x/x.sum()).round(1)



Company   Sector  Cap  Ratio
0    KPMG  Finance  100    0.3
1  Google     Tech  200    0.4
2     LIC  Finance  100    0.3
3     ABC  Finance  100    0.3
4   Apple     Tech  300    0.6

David Erickson · Answer 2 · 2021-01-12T21:06:38.727

0

Take the series df['Cap'] and divide by the sum of each group using transform('sum') to create a calculated column as a series. The key is to use transform to make the series created from the groupby the same length as df['Cap']:

df = pd.DataFrame({'Company': {0: 'KPMG', 1: 'Google', 2: 'LIC', 3: 'ABC', 4: 'Apple'},
 'Sector': {0: 'Finance', 1: 'Tech', 2: 'Finance', 3: 'Finance', 4: 'Tech'},
 'Cap': {0: 100, 1: 200, 2: 100, 3: 100, 4: 300}})

df['Ratio'] = round(df['Cap'] / df.groupby('Sector')['Cap'].transform('sum'), 1)

df
Out[1]: 
  Company   Sector  Cap  Ratio
0    KPMG  Finance  100    0.3
1  Google     Tech  200    0.4
2     LIC  Finance  100    0.3
3     ABC  Finance  100    0.3
4   Apple     Tech  300    0.6

edited Jan 12 '21 at 21:06

answered Jan 12 '21 at 20:58

David Erickson

16,433
2
19
35

this is not working for me. The column 'Ratio' is only returning the sum of cap for each sector – Avi G Jan 12 '21 at 21:17
@AviG if you copy and paste the above code (including the input data), it provides the output you are seeking. From there if something is different in your actual data, then you can investigate the reason why and kindly let me know more specifically. – David Erickson Jan 12 '21 at 21:20
the answer from @wwne is working perfectly fine – Avi G Jan 12 '21 at 21:24

How can I create one data frame if I have multiple filters?

2 Answers2