0

I am confused by when changes made to global variables inside a function will be retained after the execution of said function, when no global or nonlocal statement is present.

import pandas as pd
import numpy as np

d = {'col1': [1, 2], 'col2': [3, 4]}
e = {'col3': [5, 7], 'col4': [8, 9]}
df = pd.DataFrame(data=d)
dfe = pd.DataFrame(data=e)

print("function 1:")
def func1(df_name):
    df_name = df_name + 1
    df_name.drop(df_name.columns[0], axis=1, inplace=True)
    print("inside function:\n", df_name)
func1(df)
print("outside function:\n", df)
print("\n")

print("function 2:")
def func2(df_name, col_name):
    df_name['col6'] = df_name[col_name] + 1
    print("inside function:\n", df_name)
func2(df, "col1")
print("outside function:\n", df)
print("\n")

print("function 3:")
def func3(df_name1, df_name2):
    df_name1 = pd.concat([df_name1, df_name2], axis=1)
    print("inside function:\n", df_name1)
func3(df, dfe)
print("outside function:\n", df)

Output:

function 1:
inside function:
    col2
0     4
1     5
outside function:
    col1  col2
0     1     3
1     2     4

function 2:
inside function:
    col1  col2  col6
0     1     3     2
1     2     4     3
outside function:
    col1  col2  col6
0     1     3     2
1     2     4     3

function 3:
inside function:
    col1  col2  col6  col3  col4
0     1     3     2     5     8
1     2     4     3     7     9
outside function:
    col1  col2  col6
0     1     3     2
1     2     4     3

Function 1 in the code shows that adding some value to the dataframe and dropping columns will not be retained, which is expected. Function 2 shows that adding a new column will be retained - this is a surprise to me. Function 3, which I thought was just another way of adding columns, obviously does not keep col3 and col4. What kind of namespace prison-break is going on in Function 2????? In what other scenarios will I see this phenomenon again? Thanks.

BTW it is intentional that none of these functions has an explicit return as I am trying to understand what is going on. I am aware that it may not be the best practice.

Alex
  • 33
  • 8

1 Answers1

0

df_name is a function parameter.

  1. func1:

df_name = df_name + 1 changes the references of local version of df_name and so original variable is unmodified.

Same for func3

  1. func2:

df_name['col6'] actually mutates the original variable. Not the reference. So the changed value will be reflected after function ends.

If you need to change a global variable, then don't pass it as parameter, and before modifying add global variable_name line.

Read more here, interesting.

Priyesh Kumar
  • 2,837
  • 1
  • 15
  • 29