3

I'm trying to create a moving average column for my data called 'mv_avg'. I'm getting a SettingWithCopyWarning that I have been unable to fix. I could suppress the warning, but I cannot figure out where in my code I am creating a copy, and I want to utilize best practices. I've created a generalizable example below to illustrate the problem.

data = {'category' : ['a', 'a', 'a', 'b', 'b', 'b'], 'value' : [1,2,3,4,5,6]}
df = pd.DataFrame(data)
df_a = df.loc[df['category'] == 'a']
df_a['mv_avg'] = df_a['value'].rolling(window=2).mean()

This returns:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I've also tried the more verbose version:

df_a.loc[: , 'mv_avg'] = df_a.loc[:,'value'].rolling(window=2).mean()

but I get the same error. What is the best way to accomplish this without the warning?

Charles
  • 455
  • 7
  • 13

2 Answers2

5

you can create a copy using .copy()

import pandas as pd
data = {'category' : ['a', 'a', 'a', 'b', 'b', 'b'], 'value' : [1,2,3,4,5,6]}
df = pd.DataFrame(data)
df_a = df.loc[df['category'] == 'a'].copy()
df_a['mv_avg'] = df_a['value'].rolling(window=2).mean()

or you can use an indexer such has :

import pandas as pd
data = {'category' : ['a', 'a', 'a', 'b', 'b', 'b'], 'value' : [1,2,3,4,5,6]}
df = pd.DataFrame(data)
indexer = df[df['category'] == 'a'].index
df_a = df.loc[indexer, :]
df_a['mv_avg'] = df_a['value'].rolling(window=2).mean()
Steven G
  • 16,244
  • 8
  • 53
  • 77
  • In case of explicit copying (`df_a = df.loc[df['category'] == 'a'].copy()`), is the data copied twice (first by boolean indexing and then by `.copy()`) ? – Aivar Nov 17 '17 at 16:41
  • @Aivar, it is unclear, it may or may not be a double copy. ( this is coming from an issue I raised on the pandas github) – Steven G Nov 17 '17 at 17:01
2

Here are three options

  1. Ignore/filter the warning; in this case it is spurious as you are deliberately assigning to a filtered DataFrame.

  2. If you are done with df, you could del it, which will prevent the warning, because df_awill no longer hold a reference to df.

  3. Take a copy as in the other answer

chrisb
  • 49,833
  • 8
  • 70
  • 70