Why do my values not change after running my function?

Question

I am trying to remove outliers from columns. Lets say I have:

rand_df = pd.DataFrame({"A": [1,2,3], 'B': [4,5,6]})

If I do:

rand_df = rand_df[rand_df['A'] > 2]

I get a new df which is what I want. However if I try:

def some_fxn(df, col):
    df = df[df[col] > 2]

some_fxn(rand_df, 'A')

My df is unaltered. What do I need to do to enable this function to operate properly?

score 2 · Answer 1 · answered Feb 06 '21 at 16:17

2

Try not to assign in a function call, instead return and assign it:

def some_fxn(df, col):
    return df[df[col] > 2]

df = some_fxn(rand_df, 'A') # assign to df for updating or any other variable for copy

answered Feb 06 '21 at 16:17

anky

74,114
11
41
70

score 1 · Answer 2 · answered Feb 06 '21 at 16:17

1

You need to use return at the end of your function. A function without an explicit return statement returns None.

def some_fxn(df, col):
    return df[df[col] > 2]

some_fxn(rand_df, 'A')

Out[412]: 
   A  B
2  3  6

answered Feb 06 '21 at 16:17

sophocles

13,593
3
14
33

score 1 · Answer 3 · answered Feb 06 '21 at 16:22

You're expecting pass by reference behavior. Python neither has pass by reference nor pass by value. It just has bindings to names.

The following code shows when the ID of the object changes

import pandas as pd
rand_df = pd.DataFrame({"A": [1,2,3], 'B': [4,5,6]})
rand_df = rand_df[rand_df['A'] > 2]
print(rand_df)


def some_fxn(df, col):
    print(id(df))
    df = df[df[col] > 2]
    print(id(df))
    
rand_df = pd.DataFrame({"A": [1,2,3], 'B': [4,5,6]})
some_fxn(rand_df, 'A')
print(rand_df)

So you have no choice but return the new value.

Why do my values not change after running my function?

3 Answers3