1

I need to get sum of values from selected group of rows in DF, and after that calculate percentage for each row in a group.

df = pd.read_csv("file.csv")

names = df.groupby(["Names", "Action"]).size().reset_index(name='counts')

names.rename(columns={"counts":"Count"}, inplace=True)

Get DF like this:

  Name    Action     Count
0  Name1  Action 1      7
1  Name1  Action 2      3
2  Name1  Action 3    720
3  Name1  Action 4     95
4  Name1  Action 5    301
5  Name1  Action 6      1
6  Name2  Action 1     17
7  Name2  Action 2      1
8  Name3  Action 1      1
9  Name3  Action 2     23
10 Name3  Action 3    170

For each Name I have different action with corresponding number. I need to count total number of all types of Actions for each Name (for Name1 it will we 1127). After I need to add column "% of Total" to DF that will display % of each row value from total by each of given Names. Each "Name group" (Name1, Name2...) of rows inside "% of Total" column will have their own 100 % percentage.
It should look like this:

  Name    Action     Count  % of Total
0  Name1  Action 1      7    0.62
1  Name1  Action 2      3    0.27
2  Name1  Action 3    720    63.89
3  Name1  Action 4     95    8.43
4  Name1  Action 5    301    26.71
5  Name1  Action 6      1    0.09
6  Name2  Action 1     17    94.44
7  Name2  Action 2      1    5.56
8  Name3  Action 1      1    0.52
9  Name3  Action 2     23    11.86
10 Name3  Action 3    170    87.63

To get the total sum of all "Action" values for each Name I created a new DF:

df_total = df.groupby(["Name"]).size().reset_index(name='counts_new')

Df looks like this:

  Name  counts_new
0 Name1   1127
1 Name2   18
2 Name3   194
3 Name4   1377
4 Name5   93
5 Name6   1627
6 Name7   1355
7 etc... 

Now my plan is to add column to Df with some calculations. BUT I can't understand how to write a code where: each row with Name1 will take for calculation corresponding row (with Name1) from df_total.

When I get this (I understand it's not right), it just take first row from DF and perform calculation with first row from df_total, and than second, third. Which get all wrong results.

names["% of Total"] = df["Count"] * 100 / df_total["counts_new"]

   Name  Action    Count    % of Total
0 Name1  Action 1     7     0.621118
1 Name1  Action 2     3     16.666667
2 Name1  Action 3    720    371.134021
3 Name1  Action 4     95    6.899056
4 Name1  Action 5    301    323.655914
5 Name1  Action 6     1     0.061463
6 Name2  Action 1     17    1.254613
7 Name2  Action 2     1     0.125945 

Will appreciate any help. Thank you.

Chris Adams
  • 18,389
  • 4
  • 22
  • 39

2 Answers2

1

IIUC, use transform

df['pctg'] = df.Count / df.groupby(['Name']).Count.transform('sum') * 100

Outputs

    Name    Action      Count   pctg
0   Name1   Action 1    7       0.621118
1   Name1   Action 2    3       0.266193
2   Name1   Action 3    720     63.886424
3   Name1   Action 4    95      8.429459
4   Name1   Action 5    301     26.708075
5   Name1   Action 6    1       0.088731
6   Name2   Action 1    17      94.444444
7   Name2   Action 2    1       5.555556
8   Name3   Action 1    1       0.515464
9   Name3   Action 2    23      11.855670
10  Name3   Action 3    170     87.628866
rafaelc
  • 57,686
  • 15
  • 58
  • 82
0

Try this:

df['% of Total'] = (df['Count'] / df.groupby('Name')['Count'].transform('sum') * 100).round(2)

Output:

     Name    Action  Count  % of Total
0   Name1  Action 1      7        0.62
1   Name1  Action 2      3        0.27
2   Name1  Action 3    720       63.89
3   Name1  Action 4     95        8.43
4   Name1  Action 5    301       26.71
5   Name1  Action 6      1        0.09
6   Name2  Action 1     17       94.44
7   Name2  Action 2      1        5.56
8   Name3  Action 1      1        0.52
9   Name3  Action 2     23       11.86
10  Name3  Action 3    170       87.63
Scott Boston
  • 147,308
  • 15
  • 139
  • 187