0

Am i doing something wrong here or is there a bug here.

df2 is a copy/slice of df1. But the minute i attempt to group it by column A and get the last value of the grouping from column C, creating a new column 'NewMisteryColumn', df1 also gets a new 'NewMisteryColumn'

The end result in df2 is correct. I also have different ways on how i can do this, i am not looking for a different method, just wondering on whether i have stumbled upon a bug.

My question is, isn't df1 separate from df2, why is df1 also getting the same column?

df1 = pd.DataFrame({'A':['some value','some value', 'another value'],
                            'B':['rthyuyu','truyruyru', '56564'],
                            'C':['tryrhyu','tryhyteru', '54676']})



df2 = df1

df2['NewMisteryColumn'] = df2.groupby(['A'])['C'].tail(1)
Jeff
  • 551
  • 1
  • 5
  • 19
  • If you're not intending to modify `df1` then take a copy `df2 = df1.copy()` – EdChum Aug 25 '16 at 13:04
  • `df1` and `df2` are just two names for the same object. That's how Python variable assignment works --- see [this answer](http://stackoverflow.com/a/6794990/509824) for a nice explanation with diagrams. – Alicia Garcia-Raboso Aug 25 '16 at 13:36

1 Answers1

1

The problem is that df2 is just another reference to the DataFrame.

df2 = df1
df3 = df1.copy()

df1 is df2  # True
df1 is df3  # False

You can also verify the ids...

id(df1)
id(df2)  # Same as id(df1)
id(df3)  # Different!
spadarian
  • 1,604
  • 10
  • 14