0

I am trying to remove outliers from columns. Lets say I have:

rand_df = pd.DataFrame({"A": [1,2,3], 'B': [4,5,6]})

If I do:

rand_df = rand_df[rand_df['A'] > 2]

I get a new df which is what I want. However if I try:

def some_fxn(df, col):
    df = df[df[col] > 2]

some_fxn(rand_df, 'A')

My df is unaltered. What do I need to do to enable this function to operate properly?

3 Answers3

2

Try not to assign in a function call, instead return and assign it:

def some_fxn(df, col):
    return df[df[col] > 2]

df = some_fxn(rand_df, 'A') # assign to df for updating or any other variable for copy
anky
  • 74,114
  • 11
  • 41
  • 70
1

You need to use return at the end of your function. A function without an explicit return statement returns None.

def some_fxn(df, col):
    return df[df[col] > 2]

some_fxn(rand_df, 'A')

Out[412]: 
   A  B
2  3  6
sophocles
  • 13,593
  • 3
  • 14
  • 33
1

You're expecting pass by reference behavior. Python neither has pass by reference nor pass by value. It just has bindings to names.

The following code shows when the ID of the object changes

import pandas as pd
rand_df = pd.DataFrame({"A": [1,2,3], 'B': [4,5,6]})
rand_df = rand_df[rand_df['A'] > 2]
print(rand_df)


def some_fxn(df, col):
    print(id(df))
    df = df[df[col] > 2]
    print(id(df))
    
rand_df = pd.DataFrame({"A": [1,2,3], 'B': [4,5,6]})
some_fxn(rand_df, 'A')
print(rand_df)

So you have no choice but return the new value.

Thomas Weller
  • 55,411
  • 20
  • 125
  • 222