0

I have a function into which I pass variables. I do not change the names of the variables.

I expect the variable inside the function to be treated as a local variable but in several instances it appears to actually change the variable of the same name outside the function. I don't think this should be happening. Anyone experienced it?

So, I have this:

def(df1,df2,df3)

    df1.set_index('A',inplace=True)

    df2['c'] = df1['B'] * df3['G']

    return df2

I am finding that df1.set_index('A',inplace=True) is changing df1 outside the function. So when I call the function again I get an error because the function doesn't "see" df1['A'] in df1. It sees df1 passed from the "outside" as having the index set to 'A' already in an earlier call.

Anyone get this kind of memory bleed?

Windstorm1981
  • 2,564
  • 7
  • 29
  • 57
  • 6
    That's precisely what you're *explicitly telling* it to do, with `inplace=True`. See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html. *"I expect the variable inside the function to be treated as a local variable"* - with mutable values, that's not a useful expectation. – jonrsharpe Jun 28 '18 at 11:14
  • 1
    When you pass `inplace = True` , It change the original data as both are having the same reference. If you do not want that, use different variable with `inplace = False`. e.g `df1 = df1.set_index('A',inplace=False)` – Sumit S Chawla Jun 28 '18 at 11:16
  • Possible duplicate of [How do I pass a variable by reference?](https://stackoverflow.com/questions/986006/how-do-i-pass-a-variable-by-reference) – YSelf Jun 28 '18 at 11:17
  • 2
    You appear to be modifying `df2` but returning `df3` - this is pretty confusing! – asongtoruin Jun 28 '18 at 11:30
  • @sam @jonrsharpe so `inplace` transcends the my local variable status? Instead of just changing the local variable of the same name as the one outside the function it changes it in BOTH places? – Windstorm1981 Jun 28 '18 at 12:00

1 Answers1

2

In Python function, parameter values are passed by assignment (the parameter passed in is actually a reference to an object, but the reference is passed by value), so if you modify df1 you're modifying the dataframe that was passed as paramter in the function.

You might want to use copy

def func(df1,df2,df3):
    df1 = df1.copy()
    df1.set_index('A',inplace=True)

    df2['c'] = df1['B'] * df3['G']

    return df3

Note : By default deep parameter of the copy function is True, so might use memory, because of data duplication.

harshil9968
  • 3,254
  • 1
  • 16
  • 26