2

I am trying to create a row at the bottom of a dataframe to show the sum of certain columns. I am under the impression that this shall be a really simple operation, but to my surprise, none of the methods I found on SO works for me in one step.

The methods that I've found on SO:

  1. df.loc['TOTAL'] = df.sum()

This doesn't work for me as long as there are non-numeric columns in the dataframe. I need to select the columns first and then concat the non-numeric columns back

  1. df.append(df.sum(numeric_only=True), ignore_index=True)

This won't preserve my data types. Integer column will be converted to float.

  1. df3.loc['Total', 'ColumnA']= df['ColumnA'].sum()

I can only use this to sum one column.

I must have missed something in the process as this is not that hard an operation. Please let me know how I can add a sum row while preserving the data type of the dataframe.

Thanks.

Edit:

First off, sorry for the late update. I was on the road for the last weekend

Example:

df1 = pd.DataFrame(data = {'CountyID': [77, 95], 'Acronym': ['LC', 'NC'], 'Developable': [44490, 56261], 'Protected': [40355, 35943], 
                          'Developed': [66806, 72211]}, index = ['Lehigh', 'Northampton'])

Pre Sum

What I want to get would be

Sum

Please ignore the differences of the index.

It's a little tricky for me because I don't need to get the sum for the column 'County ID' since it's for specific indexing. So the question is more about getting the sum of specific numeric columns.

Thanks again.

Bowen Liu
  • 1,065
  • 1
  • 11
  • 24
  • 1
    Please provide a sample DataFrame with 5-10 rows and clearly indicate your output. Your question (and the current answer) is ambiguous at best. – cs95 Feb 08 '19 at 22:02
  • @coldspeed I just did. Please lemme know if it works. – Bowen Liu Feb 11 '19 at 21:30

2 Answers2

4

Here is some toy data to use as an example:

df = pd.DataFrame({'A':[1.0,2.0,3.0],'B':[1,2,3],'C':['A','B','C']})

So that we can preserve the dtypes after the sum, we will store them as d

d = df.dtypes

Next, since we only want to sum numeric columns, pass numeric_only=True to sum(), but follow similar logic to your first attempt

df.loc['Total'] = df.sum(numeric_only=True)

And finally, reset the dtypes of your DataFrame to their original values.

df.astype(d)

         A  B    C
0      1.0  1    A
1      2.0  2    B
2      3.0  3    C
Total  6.0  6  NaN
DJK
  • 8,924
  • 4
  • 24
  • 40
  • Thanks. This is great. It's almost exactly what I want, except for the part that sometimes I don't need sum of certain numeric columns (like I showed in my edit). What shall I do then? – Bowen Liu Feb 11 '19 at 21:46
0

To select the numeric columns, you can do

df_numeric = df.select_dtypes(include = ['int64', 'float64'])
df_num_cols = df_numeric.columns

Then do what you did first (using what I found here)

df.loc['Total'] = pd.Series(df[df_num_cols].sum(), index = [df_num_cols])
m13op22
  • 2,168
  • 2
  • 16
  • 35
  • Alternatively, you can select the numeric columns by selecting columns that aren't numeric types by doing `df_not_numeric = df.select_dtypes(include = ['object', 'datetime'])` then `df_num_cols = df.columns.difference(df_num.columns)` – m13op22 Feb 08 '19 at 22:09