36

This is one of the lines in my code where I get the SettingWithCopyWarning:

value1['Total Population']=value1['Total Population'].replace(to_replace='*', value=4)

Which I then changed to :

row_index= value1['Total Population']=='*'
value1.loc[row_index,'Total Population'] = 4

This still gives the same warning. How do I get rid of it?

Also, I get the same warning for a convert_objects(convert_numeric=True) function that I've used, is there any way to avoid that.

 value1['Total Population'] = value1['Total Population'].astype(str).convert_objects(convert_numeric=True)

This is the warning message that I get:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy 
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Pragnya Srinivasan
  • 533
  • 1
  • 6
  • 12
  • Because your are updating the same set of values in that dataframe at the same time, you can try to use a temp variable to hold the result and apply back to the column, see if this helps ridding of the warning message. – Anzel Sep 14 '15 at 20:56
  • 1
    what version of python & pandas are you using? – PabTorre Sep 14 '15 at 21:11

9 Answers9

43

If you use .loc[row, column] and still get the same error, it's probably because of copying another dataframe. You have to use .copy().

This is a step-by-step error reproduction:

import pandas as pd

d = {'col1': [1, 2, 3, 4], 'col2': [3, 4, 5, 6]}
df = pd.DataFrame(data=d)
df
#   col1    col2
#0  1   3
#1  2   4
#2  3   5
#3  4   6

Creating a new column and updating its value:

df['new_column'] = None
df.loc[0, 'new_column'] = 100
df
#   col1    col2    new_column
#0  1   3   100
#1  2   4   None
#2  3   5   None
#3  4   6   None

No error I receive. But, let's create another dataframe given the previous one:

new_df = df.loc[df.col1>2]
new_df
#col1   col2    new_column
#2  3   5   None
#3  4   6   None

Now, using .loc, I will try to replace some values in the same manner:

new_df.loc[2, 'new_column'] = 100

However, I got this hateful warning again:

A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

SOLUTION

use .copy() while creating the new dataframe will solve the warning:

new_df_copy = df.loc[df.col1>2].copy()
new_df_copy.loc[2, 'new_column'] = 100

Now, you won't receive any warnings!

If your dataframe is created using a filter on top of another dataframe, always use .copy().

Hadij
  • 3,661
  • 5
  • 26
  • 48
  • I'm getting this warning suddenly when I'm simply trying to add a new column using values from a list comprehension. I'm using .loc. There is no copy of a dataframe at play here. Seems like a bug. – Rexovas May 08 '23 at 00:05
  • @Rexovas You need to trace it back, and maybe in many lines beforehand, you had a copy of the dataframe somewhere... – Hadij May 08 '23 at 15:38
  • I was passing a sliced view of the dataframe to a function, which was then adding the new columns and saving the result as a CSV. Everywhere in the code I was using `.loc` so I'm not sure why the warning, but adding `df = df.copy()` in that final function fixed it. Still don't quite understand why it considered it a copy. – Rexovas May 09 '23 at 17:07
  • @Rexovas As I said, it has problem with sliced views of dataframe. Whenever you filter a dataframe, do not forget adding `.copy()`. – Hadij May 10 '23 at 13:54
  • But using `.loc` should be sufficient as it guarantees the original dataframe is modified. If I add new columns to the slice, I would simply expect the original df to have null/nan values for the rows that did not exist in the slice. That’s the part I don’t understand. It happens to be that in this case it doesn’t matter if the original df is modified or not, but I’m still not sure why copy is necessary. – Rexovas May 11 '23 at 14:26
  • That's only a warning. For instance, my expectation is that slice should be completely separated from the original one! In that case, `.copy()` helps me give the slice independence. – Hadij May 11 '23 at 17:10
9

Have you tried setting directly?:

value1.loc[value1['Total Population'] == '*', 'Total Population'] = 4
Alexander
  • 105,104
  • 32
  • 201
  • 196
3

I came here because I wanted to conditionally set the value of a new column based on the value in another column.

What worked for me was numpy.where:

import numpy as np
import pandas as pd
...

df['Size'] = np.where((df.value > 10), "Greater than 10", df.value)

From numpy docs, this is equivelant to:

[xv if c else yv
 for c, xv, yv in zip(condition, x, y)]

Which is a pretty nice use of zip...

kztd
  • 3,121
  • 1
  • 20
  • 18
2

I have no idea how bad the data storage/memory implications are with this but it fixes it every time for your average dataframe:

def addCrazyColFunc(df):
    dfNew = df.copy()
    dfNew['newCol'] = 'crazy'
    return dfNew

Just like the message says... make a copy and you're good to go. Please if someone can fix the above without the copy, please comment. All the above loc stuff doesn't work for this case.

blissweb
  • 3,037
  • 3
  • 22
  • 33
2

It is a warning about whether or not the source df is updated in replica update using sliced index. If replica update, then try adding pd.set_option('mode.chained_assignment', None) before the line where the warning is raised

df_value = pd.DataFrame({ 'Total Population':['a','b','c','*'] })
value1 = df_value[ df_value['Total Population']=='*']

pd.set_option('mode.chained_assignment',  None) # <=== SettingWithCopyWarning Off

row_index = value1['Total Population']=='*'
value1.loc[row_index,'Total Population'] = 44

pd.set_option('mode.chained_assignment',  'warn') # <=== SettingWithCopyWarning Default

WangSung
  • 259
  • 2
  • 5
1

I was able to avoid the same warning message with syntax like this:

value1.loc[:, 'Total Population'].replace('*', 4)

Note that the dataframe doesn't need to be re-assigned to itself, i.e. value1['Total Population']=value1['Total Population']...

abstrakkt
  • 86
  • 6
0

Got the solution:

I created a new DataFrame and stored the value of only the columns that I needed to work on, it gives me no errors now!

Strange, but worked.

Pragnya Srinivasan
  • 533
  • 1
  • 6
  • 12
0

Specifying it is a copy worked for me. I just added .copy() at the end of the statement

value1['Total Population'] = value1['Total Population'].replace(to_replace='*', value=4).copy()
Douglas Ferreira
  • 707
  • 2
  • 5
  • 22
Rodrigo
  • 53
  • 8
0

This should fix your problem :

value1[:, 'Total Population'] = value1[:, 'Total Population'].replace(to_replace='*', value=4)