-3

I have a dataframe that looks after using the pandas group function:

pandas dataframe

group. letters.  counts.   sum_1.    sum_2
1.     a.        20        3.        2
1.     b.        40.       2.        3
2.     a.        60.       3.        5
2.     b.        90.       5.        4

and I want to automate it without having to do it in Excel to make something like this: excel

group. letters.  counts.   sum_1.        sum_1_%.   sum_2.   sum_2rate
    1.     a.        20        3.        5%          2       3.33%
    1.     b.        40.       2.        3.33%       3       5%
    total.           60        5                     5
    2.     a.        60.       3.                    5
    2.     b.        90.       5.                    4
  • No, I'm trying to find the total of counts in each group and sum_1 divided by the total. – emmmmmmmm12 Jul 28 '20 at 00:48
  • Same as sum_2, don't know if there's a way to do this in Pandas without going into excel. I have about 200 groups. – emmmmmmmm12 Jul 28 '20 at 00:50
  • What are you trying to achieve? How do you get sum1% ? You could get a df with groupby('group') and append it, then sort... but I recommend to do that on a temp df used only for reporting, so you keep data intact – RichieV Jul 28 '20 at 01:48

1 Answers1

1

yes using Trenton link, plus some other stuff you can achieve what you want. you probably is just lazy to do more research.

d = [{'group': 1, 'letters': 'a', 'counts': 20, 'sum1': 3, 'sum2': 2},
    {'group': 1, 'letters': 'b', 'counts': 40, 'sum1': 2, 'sum2': 3},
    {'group': 2, 'letters': 'a', 'counts': 60, 'sum1': 3, 'sum2': 5},
    {'group': 2, 'letters': 'b', 'counts': 90, 'sum1': 5, 'sum2': 4}
]
df = pd.DataFrame(d, columns=['group', 'letters', 'counts', 'sum1', 'sum2']) # this is your original df
#print(df)

df2 = df.groupby(by="group").sum() # this will give the summary you want
#print(df2)

df3 = pd.merge(df, df2, how='inner', on='group') # merge 2 of them
print(df3)

df3['avg_counts'] = df3['counts_x'] / df3['counts_y'] # add new calculated columns 
df3['avg_sum1'] = df3['sum1_x'] / df3['sum1_y']
df3['avg_sum2'] = df3['sum2_x'] / df3['sum2_y']
print(df3.head())

#if needed remove the columns you dont need

result

   group letters  counts_x  sum1_x  sum2_x  counts_y  sum1_y  sum2_y  avg_counts  avg_sum1  avg_sum2
0      1       a        20       3       2        60       5       5   0.333333      0.600  0.400000
1      1       b        40       2       3        60       5       5   0.666667      0.400  0.600000
2      2       a        60       3       5       150       8       9   0.400000      0.375  0.555556
3      2       b        90       5       4       150       8       9   0.600000      0.625  0.444444
  • +10 for _you probably is just lazy to do more research._ – Trenton McKinney Jul 28 '20 at 02:18
  • thanks +10 :) by the way Trenton, when I want to print(df3) --> has many columns, the print result kinda shortened it (not showing the all columns). how do I show the full columns? *Im still new in python too.. – Boyke Ferdinandes Jul 28 '20 at 02:22
  • `group letters counts_x sum1_x ... sum2_y avg_counts avg_sum1 avg_sum2` it show that ... if there are too many columnd to print out – Boyke Ferdinandes Jul 28 '20 at 02:23
  • 1
    `pd.set_option('display.max_columns', 700)` & `pd.set_option('display.max_rows', 400)` & `pd.set_option('display.min_rows', 10)` & `pd.set_option('display.expand_frame_repr', True)` – Trenton McKinney Jul 28 '20 at 02:23