2

I'm trying to calculate the ratio by columns in python.

import pandas as pd
import numpy as np

data={
    'category': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'value 1': [1, 1, 2, 5, 3, 4, 4, 8, 7],
    'value 2': [4, 2, 8, 5, 7, 9, 3, 4, 2]
}
data=pd.DataFrame(data)
data.set_index('category')
#        value 1    value 2
#category       
#       A      1          4
#       B      1          2
#       C      2          8
#       D      5          5
#       E      3          7
#       F      4          9
#       G      4          3
#       H      8          4
#       I      7          2

The expected results is as below:

#The sum of value 1: 35, value 2: 44
#The values in the first columns were diveded by 35, and the second columns were divded by 44
#            value 1    value 2
#category       
#       A      0.028      0.090
#       B      0.028      0.045
#       C      0.057      0.181
#       D      0.142      0.113
#       E      0.085      0.159
#       F      0.114      0.204
#       G      0.114      0.068
#       H      0.228      0.090
#       I      0.2        0.045

I tried to run the below code, but it returned NaN values:

data=data.apply(lambda x:x/data.sum())
data

I think that there are simpler methods for this job, but I cannot search the proper keywords..
How can I calculate the ratio in each column?

Ssong
  • 184
  • 1
  • 10
  • 1
    `data = data.set_index('category'); data/data.sum()`? – Sayandip Dutta Apr 12 '22 at 06:42
  • @SayandipDutta Oh, it works! Then.. do you know why apply function didn't work? – Ssong Apr 12 '22 at 06:50
  • 2
    Yes, because `data.set_index` by itself doesn't work inplace unless you mention `inplace=True` in `set_index`. So it couldn't compute the `sum` and division operation for `category` column as it is in string. Also, it is advisable to not use `apply` when you can have a vectorized solution. [`apply` is slow](https://stackoverflow.com/a/54432584/5431791) – Sayandip Dutta Apr 12 '22 at 07:03
  • @SayandipDutta Thank you for your kind reply! I read the post that you attached, and I understand the mechanism of the 'apply' function. Thanks a lot! – Ssong Apr 12 '22 at 07:54

1 Answers1

1

The issue is that you did not make set_index permanent.

What I usually do to ensure I do correct things is using pipelines

data=pd.DataFrame(data)
dataf =(
 data
  .set_index('category')
  .transform(lambda d: d/d.sum())
)

print(dataf)

By piping commands, you get what you want. Note: I used transform instead of apply for speed.

They are easy to read, and less prune to mistake. Using inplace=True is discouraged in Pandas as the effects could be unpredictable.

Prayson W. Daniel
  • 14,191
  • 4
  • 51
  • 57
  • Thank you for your comment! Although I'm not familiar with piping, I'll practice it for future works. – Ssong Apr 12 '22 at 08:10