0

I came across a pandas curiosity, which I can't find replicated on SO. It looks like for some use cases, pandas dataframes are treated as global variables in python functions, not local variables. For example:

df = pd.DataFrame({'A':[1, 2, 3, 4],
                   'B':['a', 'b', 'c', 'd']})

def some_function(x):
    x['new'] = 0
    return

some_function(df)
print(df)

   A  B  new
0  1  a    0
1  2  b    0
2  3  c    0
3  4  d    0

Experimenting around, this behaviour stops as soon as you start copying data around within the function.

df = pd.DataFrame({'A':[1, 2, 3, 4],
                   'B':['a', 'b', 'c', 'd']})

def some_function(x):
    y = x.copy()
    y['new'] = 0
    x = y.copy()
    return

some_function(df)
print(df)

   A  B
0  1  a
1  2  b
2  3  c
3  4  d

My question is - is this an intentional feature of pandas (and if so, for what purpose?), or just an accidental side-effect of how pandas dataframes are stored and operated on in memory? It doesn't happen with numpy arrays, as far as I can tell.

Thomas
  • 1
  • Hi, I think this link will be usefull : [Original vs copy dataframe](https://stackoverflow.com/questions/48173980/pandas-knowing-when-an-operation-affects-the-original-dataframe) – Abdelfattah Boutanes Mar 29 '22 at 09:58

1 Answers1

0

This is normal python behaviour and not pandas specific:

Have a look on the following code:

l = []

def a():
    l.append(42)

def b():
    l = [1]

a()
;l => [42]

In your case x is a global variable and in some_function you are modifying that global variable. In the second case x= y.copy() does not modify the global variable x. Instead you create a new local variable with the name x, that shadows the global x. If you want to redefine the global x instead. You must declare x as global in your function

def some_function(x):
    global x
    y = x.copy()
    y['new'] = 0
    x = y.copy()  
    return
Hatatister
  • 962
  • 6
  • 11
  • 1
    Thanks! I'm too new to upvote, but that makes sense. So any global variables with an inbuilt method could be altered from within a function - but as soon as you define any variables (even with a name that shadows the global), it will remain a local variable unless specified as global. Is that right? Appreciate the help. – Thomas Mar 29 '22 at 11:16
  • Yes, but it is not restricted to methods. It works with attributes, too. `x.y = 42`will tell the interpreter: set the attribute "y" on object labeled by x to value 42. This will work unless definition of attributes is restricted by __slots__. But that is another topic – Hatatister Mar 29 '22 at 11:39
  • In addition there is the `nonlocal` statement, which tells the interpreter to use the variable from the next enclosing scope. E.g. an inner function refers to a variable from an outer function. Link to python docs: https://docs.python.org/3/reference/simple_stmts.html#the-global-statement – Hatatister Mar 29 '22 at 11:41
  • Very thorough, and again, much appreciated. – Thomas Mar 29 '22 at 12:04