0

I am new to Python, and want to make a copy (x2) of an existing Pandas dataframe (x1), and adjust all existing values to another value (or set them to e.g. NaN). This was attempted as follows:

x1    = pd.DataFrame({'x':[1,2,3], 'y':[4,5,6]}) 
x2    = x1 
x2[:] = 5 
x1

After redefining all values of x2 by 5, however, x1 also gets redefined. This behavior of redefinition of existing variables is highly unwanted. Why is this happening and how could it be prevented? Thanks in advance!

AMC
  • 2,642
  • 7
  • 13
  • 35
Jelmer
  • 13
  • 6
  • 1
    This is how any object works in Python. Assignment isn't a copy like it is in C++. Use `x1.copy()`. – ggorlen Jun 26 '20 at 19:16
  • 2
    There is no copy being created anywhere. `x2 = x1` does not make a copy. **Assignment never copies anything in python** " This behavior of redefinition of existing variables is highly unwanted." Nothing is being *redefined*. – juanpa.arrivillaga Jun 26 '20 at 19:16
  • Although not about pandas exactly, you should read the following: https://nedbatchelder.com/text/names.html it applies to all Python objects. Read it and learn to understand Python on its own terms, instead of trying to apply semantics from other languages (guessing C or C++) to Python. – juanpa.arrivillaga Jun 26 '20 at 19:17
  • Thanks juanpa.arrivillaga and ggorlen for the elaborations and references for further reading! – Jelmer Jun 26 '20 at 20:29

1 Answers1

3

Saying x2 = x1 makes x2 point to the same object as x1. In order to prevent that from happening, you need to make a copy of x1 and assign that to x2.

Try x2 = x1.copy()

Zachary Oldham
  • 838
  • 1
  • 5
  • 21