3

I'm working with a dataframe 'copy' created by sub-setting a previous one - see below:

import random
import pandas as pd
df = pd.DataFrame({'data':list(random.sample(range(10,100),25))})
df_filtered = df.query('data > 20 and data < 80')
df_filtered.rename(columns={'data':'observations'},inplace=True)

The problem is, when the rename method is called I receive a SettingWithCopy warning that, as I understand it, means I'm operating on a copy of the original (df in this case) object. The warning text is: "A value is trying to be set on a copy of a slice from a DataFrame"

I found this question that was answered using a different approach to subsetting. I prefer the Dataframe.query() method myself (syntax-wise). Is there a way I can create a new Dataframe object using the.query() method rather than the method suggested in the question I linked? I've tried a few options with iloc but haven't been successful thus-far.

Community
  • 1
  • 1
Sevyns
  • 2,992
  • 5
  • 19
  • 23
  • What is your goal? Do you want to have a DF with independent values (a copy)? NOTE: it'll cost you additional memory. – MaxU - stand with Ukraine Jun 28 '16 at 22:27
  • In this example yes my intention was for df_filtered to be a different and independent object entirely than df. I realize both objects will be in memory, but that's ok for this example. – Sevyns Jun 28 '16 at 22:40

2 Answers2

6

You can always explicitly make a copy by calling .copy() on your filtered dataframe. Concretely, replace

df_filtered = df.query('data > 20 and data < 80')

with

df_filtered = df.query('data > 20 and data < 80').copy()

Does that get rid of the warning?

Alicia Garcia-Raboso
  • 13,193
  • 1
  • 43
  • 48
  • That took care of it as well. So if I don't call the copy function is pandas creating a different type of object when you subset a dataframe (rather than a new object)? I've tried reviewing the documentation but I don't understand what's occurring yet... my biggest concern is I want to make sure it's doing what I want. – Sevyns Jun 28 '16 at 22:33
  • 1
    Without calling `.copy()`, `df_filtered` may be a view of the original `df` --- there's no way of knowing until runtime. That's why you're getting the warning. There's a lot of defensive copying in the Pandas code, but it is not universal --- many times you do want a view instead of a copy. – Alicia Garcia-Raboso Jun 28 '16 at 22:45
  • Thanks for the info Alberto! Do you know if the method Max suggests (where assignment is done directly with an equals operator instead of using the inplace parameter) behaves the same way as calling the copy() fn explicitly? Just trying to understand if there is more nuance behavior I should be aware of :) – Sevyns Jun 28 '16 at 22:50
  • Yes, without the `inplace=True` you're creating a new dataframe, so the assignment MaxU suggest ends up giving you the exact same result as what I propose. – Alicia Garcia-Raboso Jun 28 '16 at 22:54
1

try this instead of using inplace=True:

In [12]: df_filtered = df.query('data > 20 and data < 80')

In [13]: df_filtered = df_filtered.rename(columns={'data':'observations'})

.rename() function returns a new object, so you can simply overwrite your DF with the returned new DF

if you use inplace the following is happening

from docs:

inplace : boolean, default False

Whether to return a new DataFrame. If True then value of copy is ignored.

Returns:

renamed : DataFrame (new object)

PS basically you should try to avoid using inplace=True and use df = df.function(...) technique instead

Community
  • 1
  • 1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thanks Max - the warning went away but why would forcing assignment like this work when the inplace parameter doesn't? I suppose I could get used to this but I'd like to understand why this works over what I tried... any thoughts? – Sevyns Jun 28 '16 at 22:30
  • 2
    Why should you " avoid using inplace=True and use df = df.function(...)" – Merlin Jun 29 '16 at 01:07