0

I have notice something weird in the behaviour of my code while using pandas dataFrames and lists. I don't know if they are correlated or if the problem is coming from something beyond my understanding. I would appreciate if someone could explain the reason. My code is similar to the following example:

list_of_df=[]
for i in range(0,5):
    df=a_function(data)
    list_of_df.append(df)

a_function is taking a initial dataFrame named "data", modify it and return it. This is a silly example but basically shows the operations I am making:

def a_function(data):
    data[new_column]=1
    data.loc[:,existing_column]=0
    return data

What I am expecting is to end up with list_of_df being a list of different dataFrames however I end up with all the dataFrames being the same and equal to the one that was appended last.

When I used the following workaround, the code is working as expected:

list_of_df=[]
for i in range(0,5):
    df=data.copy()
    df=a_function(df)
    list_of_df.append(df)

But I am not sure why. Thank you for your help!

edit: more information of what a_function is doing on the dataFrame

Bravo1
  • 13
  • 5
  • 1
    In the first instance, you're just modifying a single DataFrame object instance. This is no different than how vanilla Python works. – roganjosh Apr 29 '18 at 15:25
  • 1
    I'm not sure if there's a good dupe, but maybe start with [this](https://nedbatchelder.com/text/names.html). It can be trickier with Pandas/Numpy since some operations will automatically create a copy of your object, others will return a "view" on the original object e.g. [this](http://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html). It's pretty tough to be specific on this since you haven't actually given an example of what you're doing. Creating copies is expensive so, where possible, the library will try allow you to operate on the original object, leaving copying up to you – roganjosh Apr 29 '18 at 15:28
  • 1
    This is a Pandas question in that you need to ensure the function is returning copies. But this is mostly a Python question and is explained by @roganjosh’s link. This is as close to a dupe target as we can get https://stackoverflow.com/q/240178/2336654 – piRSquared Apr 29 '18 at 15:38
  • @roganjosh Thank you for your answer I understand a bit better now. In the function I am basically just modifying columns using `.loc` accessor. The thing is that I thought that arguments are passed to function by value in python. So I assumed that when working on "data" inside my function, it was already a copy. – Bravo1 Apr 29 '18 at 15:51
  • @Bravo1 No, if the argument is mutable (like a list, a dictionary or a DataFrame) you can modify it inside a function. It is not a copy. – ayhan Apr 29 '18 at 16:31
  • Thank you guys for your help, it is very clear now. @ayhan – Bravo1 Apr 29 '18 at 16:36

0 Answers0