Calculate sum of column using Python Pandas

Question

I am automating the calculation of an Excel file, which looks as follows:

|        |Measure 1|Measure 2|
|Company1|    4    |    5    |
|Company2|    7    |    3    |
|Company3|    3    |    1    |
|Company4|    5    |    8    |

I want to calculate the total of Measure 1 and Measure 2 over all companies (So 19 and 17, respectively). I have imported the Excel file using pandas and am looking for a formula like SUM(B2:B5) and SUM(C2:C5). Ideally I would define new variables which equal to these totals so I can append them to the bottom.

Moreover, the number of companies can differ every time I run the Python code, so I would like it to work dynamically.

Thanks!

Your example is unclear. Do you want to get a grand total (`df.sum()`) or a total per company, assuming duplicates in the index (`df.groupby(level=0).sum()`)? — mozway, Mar 29 '22 at 13:46
@mozway I want to know how often we heave observed Measure 1 (grand total for all companies) and Measure 2 (grand total for all companies). Hopefully it is clear now. — Not_a_Robot, Mar 29 '22 at 13:53

Pietro D'Antuono · Answer 1 · 2022-03-29T14:35:34.787

0

Possibly a duplicate of this question

df = pd.DataFrame(
    {
        'Company': ["Company 1", "Company 2", "Company 3", "Company 4"],
        'Measure 1': [4, 7, 3, 5],
        'Measure 2': [5, 3, 1, 8]
    }
)
df.set_index("Company", inplace=True)
sum_of_cols = df.sum(numeric_only=True)
sum_of_cols.name = "Total"
df_t = df.append(sum_of_cols, ignore_index=False)  # <-- just append sum to df
print(df_t)

Outputs:

           Measure 1  Measure 2
Company                        
Company 1        4.0        5.0
Company 2        7.0        3.0
Company 3        3.0        1.0
Company 4        5.0        8.0
Total           19.0       17.0

Another interesting package that you could install (for usage in Jupyter Notebooks) is pivottablejs. Using pivottablejs you could do some Excel-like manipulations to your data (see attached picture).

from pivottablejs import pivot_ui

pivot_ui(df)

edited Mar 29 '22 at 14:35

answered Mar 29 '22 at 13:58

Pietro D'Antuono

352
1
11

If it's a duplicate, then it should be flagged as such, not answered. – BigBen Mar 29 '22 at 13:59
@BigBen I have added a reference to `pivottablejs` that is not covered by the possible answer. It could be relevant since the person who asked "comes from Excel". – Pietro D'Antuono Mar 29 '22 at 14:37
@PietroD'Antuono Thanks, this is really helpful! One more question: what I the first row also contains text, so the calculation should start in row 3 instead of row 2? I tried changing the index already but that does not work. – Not_a_Robot Mar 29 '22 at 14:43
If the row containing text is exactly the first one, set your DataFrame to `df = df[1:]` before performing the sum – Pietro D'Antuono Mar 29 '22 at 14:50

score 0 · Answer 2 · answered Mar 29 '22 at 14:27

Are you looking for:

out = df.assign(Total=df.sum(axis=1))
out = pd.concat([out, out.sum().to_frame('Total').T])
print(out)

# Output
          Measure 1  Measure 2  Total
Company1          4          5      9
Company2          7          3     10
Company3          3          1      4
Company4          5          8     13
Total            19         17     36

Calculate sum of column using Python Pandas

2 Answers2