0

hi I have a dataset that looks much like this data frame below:

#Table1 :
print("Table1: Current Table")
data = [['ALFA', 35, 47, 67, 44, 193],  
        ['Bravo', 51, 52, 16, 8, 127], 
        ['Charlie', 59, 75, 2, 14, 150],  
        ['Delta', 59, 75, 2, 34, 170],
        ['Echo', 59, 75, 2, 14, 150],
        ['Foxtrot', 40, 43, 26, 27, 136], 
        ['Golf', 35, 31, 22, 13, 101], 
        ['Hotel', 89, 58, 24, 34, 205]]

df = pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Total'])
#df.loc[:,'Total'] = df.sum(axis=1)
print(df)

i would want to get the percentage of all cells against their row totals (calculated in column 'Total') such that it looks this:

#Table2 :
print('Table2: Expected Outcome')
data2 = [['ALFA',18.1, 24.4, 34.7, 22.8, 193], 
        ['Bravo',40.2, 40.9, 12.6, 6.3, 127], 
        ['Charlie',39.3, 50.0, 1.3, 9.3, 150], 
        ['Delta',34.7, 44.1, 1.2, 20.0, 170],
        ['Echo',39.3, 50.0, 1.3, 9.3, 150],
        ['Foxtrot',29.4, 31.6, 19.1, 19.9, 136],
        ['Hotel',34.7, 30.7, 21.8, 12.9, 101], 
        ['Golf',43.4, 28.3, 11.7, 16.6, 205]]
df2 = pd.DataFrame(data2, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Total']) #.round(decimals=1)
#df.loc[:,'Total'] = df.sum(axis=1)
print(df2)

I am not really interested if the total column does change, is recalculated or have to be dropped in the process; but for completeness sake it would be good to have a 'Total' column along with the cells' percentages

3 Answers3

-1

Use fast vecorized division all columns without Objects/Total by DataFrame.div:

c = df.columns.difference(['Objects','Total'])
df[c] = df[c].div(df['Total'], axis=0).mul(100)
print(df)
   Objects    Column1    Column2    Column3    Column4  Total
0     ALFA  18.134715  24.352332  34.715026  22.797927    193
1    Bravo  40.157480  40.944882  12.598425   6.299213    127
2  Charlie  39.333333  50.000000   1.333333   9.333333    150
3    Delta  34.705882  44.117647   1.176471  20.000000    170
4     Echo  39.333333  50.000000   1.333333   9.333333    150
5  Foxtrot  29.411765  31.617647  19.117647  19.852941    136
6     Golf  34.653465  30.693069  21.782178  12.871287    101
7    Hotel  43.414634  28.292683  11.707317  16.585366    205
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
-1

you can try using apply :

df[['Column1', 'Column2', 'Column3', 'Column4']] = df[['Column1', 'Column2', 'Column3', 'Column4']].apply(lambda x: x/x.sum(), axis=1)

Output :

Table1: Current Table
   Objects   Column1   Column2   Column3   Column4  Total
0     ALFA  0.181347  0.243523  0.347150  0.227979    193
1    Bravo  0.401575  0.409449  0.125984  0.062992    127
2  Charlie  0.393333  0.500000  0.013333  0.093333    150
3    Delta  0.347059  0.441176  0.011765  0.200000    170
4     Echo  0.393333  0.500000  0.013333  0.093333    150
5  Foxtrot  0.294118  0.316176  0.191176  0.198529    136
6     Golf  0.346535  0.306931  0.217822  0.128713    101
7    Hotel  0.434146  0.282927  0.117073  0.165854    205
grymlin
  • 492
  • 1
  • 9
  • 1
    https://stackoverflow.com/questions/54432583/when-should-i-not-want-to-use-pandas-apply-in-my-code/54432584#54432584 – jezrael Sep 06 '22 at 11:46
-2

Create a new dataframe using the same data. Loop through all columns of the dataframe except the last column(ie, Total) using df.columns[1:-1] and compute the percentage.

df1=pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Total'])

for col in df.columns[1:-1]:
    df1[col]=(df[col]*100/df.Total)

df1
me280872
  • 3
  • 3
  • Please don't post only code as answer, but also provide an explanation what your code does and how it solves the problem of the question. Answers with an explanation are usually more helpful and of better quality, and are more likely to attract upvotes. – Mark Rotteveel Sep 12 '22 at 09:13