0

I have been playing with python and understanding the concept of copying a dataframe through the .copy function as opposed to just reassigning it to a variable.

Let's say we have the following data frame: dfx:

   Name        Score1   Score2  Score3        Score4
0  Jack            10  Perfect      10       Perfect
1  Jill            10       10      10  Not Finished
2  Jane            20       10      10             5
3   Tom  Not Finished       15      10             5

dfx2 = dfx.drop("Score1",axis = 1)

dfx2:

   Name   Score2  Score3        Score4
0  Jack  Perfect      10       Perfect
1  Jill       10      10  Not Finished
2  Jane       10      10             5
3   Tom       15      10             5

running dfx again still returns the original dataframe

   Name        Score1   Score2  Score3        Score4
0  Jack            10  Perfect      10       Perfect
1  Jill            10       10      10  Not Finished
2  Jane            20       10      10             5
3   Tom  Not Finished       15      10             5

Shouldn't the reassignment cause the column "Score1" be dropped from the original dataset as well?

However, running the following:

dfx3 = dfx

dfx3

   Name        Score1   Score2  Score3        Score4
0  Jack            10  Perfect      10       Perfect
1  Jill            10       10      10  Not Finished
2  Jane            20       10      10             5
3   Tom  Not Finished       15      10             5

dfx3.loc[0,"Score4"] = "BAD"

dfx3

   Name        Score1   Score2  Score3        Score4
0  Jack            10  Perfect      10           BAD
1  Jill            10       10      10  Not Finished
2  Jane            20       10      10             5
3   Tom  Not Finished       15      10             5

dfx
   Name        Score1   Score2  Score3        Score4
0  Jack            10  Perfect      10           BAD
1  Jill            10       10      10  Not Finished
2  Jane            20       10      10             5
3   Tom  Not Finished       15      10             5

does cause the original dataset to be modified.

Any explanation why a column drop does not modify the original dataset but an element change does change the original? and seems like any change to a column name in an assigned dataset also modifies the original dataset.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Ali Parahoo
  • 161
  • 2
  • 11
  • 1
    `.drop()` explicitly returns a copy unless you set `inplace=True`. The other thing is a python question about how object references work more than a pandas question. – CJR Jul 05 '19 at 17:58
  • `dfx3 = dfx.copy()` will clear this problem. Check [this](https://stackoverflow.com/questions/56748890/pandas-with-settingwithcopywarning/56749160#56749088) – anky Jul 05 '19 at 17:59

1 Answers1

0

You are referencing dfx3 and dfx to one DataFrame, if you want to do manipulations on dfx3with columns similar to that dfx's then you should make a copy of dfx on dfx3 not reference them both to the same DataFrame.

dfx3 = dfx.copy()
Joe
  • 879
  • 2
  • 6
  • 15