1

Using Python and Pandas I have a dataframe that is filled with numerical values. What I am trying to do, and can't figure out is how do I return a new data frame where each number represents a percentage of that row

Essentially what I need is to return a new data frame where the numbers from the old data frame are changed to represent the % they represent of that specific row as a whole. Hope that makes sense.

Below is an example of the starting data frame, each row would total 10 to make the example easy and simple

             ambivalent   negative   neutral  positive
11/15/2021       6          2             1       1
11/8/2021        4          1             2       3

what I want to achieve is this

                 ambivalent   negative   neutral  positive
11/15/2021       60%          20%           10%       10%
11/8/2021        40%          10%           20%       30%

I don't need the actual % symbol just the actual percent numbers will work.

Can someone point me in the right direction in how to do this?

zabop
  • 6,750
  • 3
  • 39
  • 84
frito
  • 33
  • 4
  • Just divide by row sums: `new_df = df.div(df.sum(axis=1), axis=0)` you can multiply by 100 if needed too `new_df = df.div(df.sum(axis=1), axis=0) * 100` – Henry Ecker Dec 18 '21 at 00:16
  • If you really wanted the percent could do `new_df = df.div(df.sum(axis=1), axis=0).mul(100).astype(str).add('%')` like [this answer](https://stackoverflow.com/a/45989890/15497888) – Henry Ecker Dec 18 '21 at 00:18
  • You could also play with the display settings instead of making them strings if you need the numerical values for computation. See [this answer](https://stackoverflow.com/a/31671975/15497888) and [Options and settings](https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html). – Henry Ecker Dec 18 '21 at 00:21

1 Answers1

0

You can use the .apply() method with a lambda function:

result = df.apply(lambda row: row/sum(row)*100,axis=1)

Example:

df = pd.DataFrame({'a':[2,3],'b':[3,5],'c':[5,2]})
result = df.apply(lambda row: row/sum(row),axis=1)

df is:

   a  b  c
0  2  3  5
1  3  5  2

result is:

      a     b     c
0  20.0  30.0  50.0
1  30.0  50.0  20.0
zabop
  • 6,750
  • 3
  • 39
  • 84