92

I have a DataFrame with numerical values. What is the simplest way of appending a row (with a given index value) that represents the sum of each column?

Air
  • 8,274
  • 2
  • 53
  • 88
user3132783
  • 5,275
  • 3
  • 15
  • 7

9 Answers9

106

To add a Total column which is the sum across the row:

df['Total'] = df.sum(axis=1)
glifchits
  • 683
  • 7
  • 26
ideate
  • 1,193
  • 1
  • 7
  • 2
102

To add a row with column-totals:

df.loc['Total']= df.sum()
BjoernL.
  • 1,211
  • 1
  • 9
  • 8
50

** Get Both Column Total and Row Total **

This gives total on both rows and columns:

import numpy as np
import pandas as pd


df = pd.DataFrame({'a': [10,20],'b':[100,200],'c': ['a','b']})

df.loc['Column_Total']= df.sum(numeric_only=True, axis=0)
df.loc[:,'Row_Total'] = df.sum(numeric_only=True, axis=1)

print(df)

                 a      b    c  Row_Total
0             10.0  100.0    a      110.0
1             20.0  200.0    b      220.0
Column_Total  30.0  300.0  NaN      330.0
BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169
  • solved it for me. Is there a YouTube video I can watch that explains how this works. Just reading the documentation is not the best when you're a visual learner like me. – JQTs Apr 04 '22 at 16:00
  • This worked for me, but only after deleting `.loc`. Not sure why. – Sam R Aug 10 '22 at 17:22
8

One way is to create a DataFrame with the column sums, and use DataFrame.append(...). For example:

import numpy as np
import pandas as pd
# Create some sample data
df = pd.DataFrame({"A": np.random.randn(5), "B": np.random.randn(5)}) 
# Sum the columns:
sum_row = {col: df[col].sum() for col in df}
# Turn the sums into a DataFrame with one row with an index of 'Total':
sum_df = pd.DataFrame(sum_row, index=["Total"])
# Now append the row:
df = df.append(sum_df)
David M.
  • 186
  • 2
  • 8
5

I have done it this way:

df = pd.concat([df,pd.DataFrame(df.sum(axis=0),columns=['Grand Total']).T])

this will add a column of totals for each row:

df = pd.concat([df,pd.DataFrame(df.sum(axis=1),columns=['Total'])],axis=1)

It seems a little annoying to have to turn the Series object (or in the answer above, dict) back into a DataFrame and then append it, but it does work for my purpose.

It seems like this should just be a method of the DataFrame - like pivot_table has margins.

Perhaps someone knows of an easier way.

Gorgsenegger
  • 7,356
  • 4
  • 51
  • 89
George Luft
  • 61
  • 1
  • 4
5

You can use the append method to add a series with the same index as the dataframe to the dataframe. For example:

df.append(pd.Series(df.sum(),name='Total'))
Alex Huszagh
  • 13,272
  • 3
  • 39
  • 67
yihui.dev
  • 602
  • 8
  • 10
1
  1. Calculate sum and convert result into list(axis=1:row wise sum, axis=0:column wise sum)
  2. Add result of step-1, to the existing dataFrame with new name
new_sum_col = list(df.sum(axis=1))
df['new_col_name'] = new_sum_col
FBruzzesi
  • 6,385
  • 3
  • 15
  • 37
Biplab Malakar
  • 750
  • 6
  • 11
0

I did not find the modern pandas approach! This solution is a bit dirty due to two chained transposition, I do not know how to use .assign on rows.

# Generate DataFrame
import pandas as pd
df = pd.DataFrame({'a': [10,20],'b':[100,200],'c': ['a','b']})

# Solution
df.T.assign(Total = lambda x: x.sum(axis=1)).T

output:

    a    b  c  Total
0  10  100  a    110
1  20  200  b    220

Adrien Pacifico
  • 1,649
  • 1
  • 15
  • 33
0

For those that have trouble because the result is 0 or NaN, check dtype first.

df.dtypes

Since sum can only process numeric try to change the type of your dataframe first. In this example, chang to int32 for integer.

df = df.astype('int32')
df.dtypes

Then, you should be able to sum across row and add new column (as the accepted answer, not the question).

df['sum']= df.sum(numeric_only=True,axis=1)

Bonus: Sort the sum column

df.sort_values(by=['sum'])
Muhammad Yasirroni
  • 1,512
  • 12
  • 22