0

I am new to Python. I wanted to try some simple function operations on dataframe but I encountered the following problem. My code is:

>>> df.head(3)
   PercChange
0    0.000000
1   -7.400653
2    2.176843
>>> def switch(array):
...     for i in range(len(array)):
...         if array[i]<0:
...             array[i]=0
...     return array
... 
>>> a=df.PercChange
>>> a=switch(a)
>>> df['PosPercChange']=a
>>> df.head(3)
   PercChange  PosPercChange
0    0.000000       0.000000
1    0.000000       0.000000
2    2.176843       2.176843

Why did my 'PercChange' column change as well? I already created a new variable for the operations separately. How can I avoid not changing my 'PercChange' column? Thanks a lot.

[Solved]

So it is the problem of the data structure. In Python, '=' assignment doesn't copy value from one to another, but instead it name the same sequence with different name so changing one also changes the other. Thanks for the help.

  • When you assign a value to a variable in Python, it doesn't copy the value; the variable just becomes a new name for the same value. So, `a` and `df.PercChange` are the exact same `Series`., and a change to one affects the other. If you want to make a copy, you have to say so explicitly. Pandas, Numpy, and other libraries have specific ways to do different kinds of copying, while the `copy` module in the stdlib has the general functions `copy.copy` and `copy.deepcopy`; you have to decide what exactly you want in each case. – abarnert Jun 16 '18 at 04:38
  • Possible duplicate of [Emulating pass-by-value behaviour in python](https://stackoverflow.com/questions/845110/emulating-pass-by-value-behaviour-in-python) – David Zemens Jun 16 '18 at 04:43
  • You created a new *name* that is a pointer to the same object. So now you have two "variables" (names) referring to the same underlying object in memory. A change to one of them changes "both" variables, because they both point to the same *thing*. – David Zemens Jun 16 '18 at 04:47

1 Answers1

1

When you assign a value to a variable in Python, it doesn't copy the value; the variable just becomes a new name for the same value.

So, a and df.PercChange are just different names for the exact same Series. The same way a change to "Star Wars V" affects "The Empire Strikes Back" or a change to "Former President George W. Bush" affects "President Bush 42", a change to a affects df.PercChange.

And calling a function is just assignment again: the parameter inside the function becomes another name for the same value as the argument in the function call, so array is the same object as a and df.PercChange.

If you want to make a into a name for a copy of the same data as df.PercChange, instead of a name for the same object, you have to ask for that copy explicitly.


With Pandas, this is usually just the copy method:

a = df.PercChange.copy()    

But Pandas (and the NumPy library that underlies it) allows for all kinds of complicated things, so there are other complicated ways to copy things.


More generally, Python has the copy module, with copy and deepcopy functions that can make shallow or deep copies of almost anything, not just Pandas Series.


But you're also halfway to a different solution. Your switch function does a return array at the end, and your caller does a = switch(a).

If switch returned a different object, a would now be a name for that different object. But, because it instead just returns its parameter, after modifying it in-place, all that a = switch(a) is doing is re-asserting a as a name for the same value it's already a name for.

So, another way to fix things is to do the copying inside switch:

def switch(array):
    array = array.copy()
    for i in range(len(array)):
        if array[i]<0:
            array[i]=0
    return array

… or to build up a whole new array or Series and return that:

def switch(array):
    return array.apply(lambda: 0 if x<0 else x)
abarnert
  • 354,177
  • 51
  • 601
  • 671