13

I am trying to sort a dataframe by total column:

df.sort_values(by='Total', ascending=False, axis=0, inplace=True)

But I'm getting the following warning:

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

When I followed the link it opens up and using .loc methods is suggested. But after that I followed the .sort_values() where I find out to use inplace=False or None.

My question is what if I got a dataframe columns which is not sorted, and if I don't use inplace=True, my dataframe will be sorted for further use or I have to assigned a new name to the dataframe and saved it.

greybeard
  • 2,249
  • 8
  • 30
  • 66
Ayan Chowdhury
  • 229
  • 2
  • 8
  • 4
    I had exactly the same issue, and I don't think inplace = True should do that, but I'm a tyro at this so I hesitate to say it is a bug, it just looks like it to me. I dropped the inplace and instead assigned the dataframe and everything was OK. inplace does support True at the link you gave – Julian Moore Feb 29 '20 at 12:26
  • 2
    @JulianMoore Agreed. This is a bug in my eyes. – jlplenio Sep 23 '21 at 09:18
  • This error is likely happening because of code that you have before sorting, where you set df to a copy of df. A common example is `df = df[['columnA', 'columnB']]`. If you share previous lines of code, happy to help you with a solution. – Scott Guthart Mar 27 '22 at 22:16

3 Answers3

6

The warning isn't clear, but if you use .copy() combined with .loc when you create df by filtering another df then the warning should go away.

import pandas as pd

df = pd.DataFrame({'num':range(10),'Total':range(20,30)})
# loc without copy
df_2 = df.loc[df.num <5]

df_2.sort_values(by='Total', ascending=False, axis=0, inplace =True)
# leads to SettingWithCopyWarning

df_3 = df.loc[df.num <5].copy()
df_3.sort_values(by='Total', ascending=False, axis=0, inplace =True)
# no warning

You will find some more details here but there is a really annoying class of Pandas bugs that the setting with copy warning is trying to protect you from.

df_4 = df.copy()
df_4['new_col'] = df_4.num *2
df_5 = df
df_5['new_col_2'] = df_5.num *3 

# df_5's column is also added to df, but not df_4, because of .copy()
df.columns
#Index(['num', 'Total', 'new_col_2'], dtype='object')

df[df.num <2].loc[:,['Total']] = 100
df.Total.max()
# still 29, because of the chained .locs, Total was not updated.
df.loc[df.num<2,'Total'] = 100
df.Total.max()
# 100
oli5679
  • 1,709
  • 1
  • 22
  • 34
1

I would just avoid the inplace operation and store the sorted dataframe like this:

df_cp = df.copy() # optional copy of the original df
df_sort = df_cp.sort_values(by='Total', ascending=False, axis=0)
del df #deleting df if it's not needed

I've come across copy warnings that seem like bugs, so currently I prefer avoiding syntax that may trigger them.

SariO
  • 429
  • 3
  • 16
1

Using assignment (rather than inplace) clears the warning for me:

df = df.sort_values(by='Total', ascending=False, axis=0)
Thomas Matthew
  • 2,826
  • 4
  • 34
  • 58