I came across a pandas curiosity, which I can't find replicated on SO. It looks like for some use cases, pandas dataframes are treated as global variables in python functions, not local variables. For example:
df = pd.DataFrame({'A':[1, 2, 3, 4],
'B':['a', 'b', 'c', 'd']})
def some_function(x):
x['new'] = 0
return
some_function(df)
print(df)
A B new
0 1 a 0
1 2 b 0
2 3 c 0
3 4 d 0
Experimenting around, this behaviour stops as soon as you start copying data around within the function.
df = pd.DataFrame({'A':[1, 2, 3, 4],
'B':['a', 'b', 'c', 'd']})
def some_function(x):
y = x.copy()
y['new'] = 0
x = y.copy()
return
some_function(df)
print(df)
A B
0 1 a
1 2 b
2 3 c
3 4 d
My question is - is this an intentional feature of pandas (and if so, for what purpose?), or just an accidental side-effect of how pandas dataframes are stored and operated on in memory? It doesn't happen with numpy arrays, as far as I can tell.