Pandas: best way to add a total row that calculates the sum of specific (multiple) columns while preserving the data type

Question

I am trying to create a row at the bottom of a dataframe to show the sum of certain columns. I am under the impression that this shall be a really simple operation, but to my surprise, none of the methods I found on SO works for me in one step.

The methods that I've found on SO:

df.loc['TOTAL'] = df.sum()

This doesn't work for me as long as there are non-numeric columns in the dataframe. I need to select the columns first and then concat the non-numeric columns back

df.append(df.sum(numeric_only=True), ignore_index=True)

This won't preserve my data types. Integer column will be converted to float.

df3.loc['Total', 'ColumnA']= df['ColumnA'].sum()

I can only use this to sum one column.

I must have missed something in the process as this is not that hard an operation. Please let me know how I can add a sum row while preserving the data type of the dataframe.

Thanks.

Edit:

First off, sorry for the late update. I was on the road for the last weekend

Example:

df1 = pd.DataFrame(data = {'CountyID': [77, 95], 'Acronym': ['LC', 'NC'], 'Developable': [44490, 56261], 'Protected': [40355, 35943], 
                          'Developed': [66806, 72211]}, index = ['Lehigh', 'Northampton'])

What I want to get would be

Please ignore the differences of the index.

It's a little tricky for me because I don't need to get the sum for the column 'County ID' since it's for specific indexing. So the question is more about getting the sum of specific numeric columns.

Thanks again.

Please provide a sample DataFrame with 5-10 rows and clearly indicate your output. Your question (and the current answer) is ambiguous at best. — cs95, Feb 08 '19 at 22:02

score 4 · Accepted Answer · answered Feb 09 '19 at 00:10

Here is some toy data to use as an example:

df = pd.DataFrame({'A':[1.0,2.0,3.0],'B':[1,2,3],'C':['A','B','C']})

So that we can preserve the dtypes after the sum, we will store them as d

d = df.dtypes

Next, since we only want to sum numeric columns, pass numeric_only=True to sum(), but follow similar logic to your first attempt

df.loc['Total'] = df.sum(numeric_only=True)

And finally, reset the dtypes of your DataFrame to their original values.

df.astype(d)

         A  B    C
0      1.0  1    A
1      2.0  2    B
2      3.0  3    C
Total  6.0  6  NaN

Thanks. This is great. It's almost exactly what I want, except for the part that sometimes I don't need sum of certain numeric columns (like I showed in my edit). What shall I do then? — Bowen Liu, Feb 11 '19 at 21:46

m13op22 · Answer 2 · 2019-02-08T21:57:28.763

0

To select the numeric columns, you can do

df_numeric = df.select_dtypes(include = ['int64', 'float64'])
df_num_cols = df_numeric.columns

Then do what you did first (using what I found here)

df.loc['Total'] = pd.Series(df[df_num_cols].sum(), index = [df_num_cols])

edited Feb 08 '19 at 21:57

answered Feb 08 '19 at 21:52

m13op22

2,168
2
16
35

Alternatively, you can select the numeric columns by selecting columns that aren't numeric types by doing `df_not_numeric = df.select_dtypes(include = ['object', 'datetime'])` then `df_num_cols = df.columns.difference(df_num.columns)` – m13op22 Feb 08 '19 at 22:09

Pandas: best way to add a total row that calculates the sum of specific (multiple) columns while preserving the data type

2 Answers2