I have notice something weird in the behaviour of my code while using pandas dataFrames and lists. I don't know if they are correlated or if the problem is coming from something beyond my understanding. I would appreciate if someone could explain the reason. My code is similar to the following example:
list_of_df=[]
for i in range(0,5):
df=a_function(data)
list_of_df.append(df)
a_function
is taking a initial dataFrame named "data", modify it and return it. This is a silly example but basically shows the operations I am making:
def a_function(data):
data[new_column]=1
data.loc[:,existing_column]=0
return data
What I am expecting is to end up with list_of_df being a list of different dataFrames however I end up with all the dataFrames being the same and equal to the one that was appended last.
When I used the following workaround, the code is working as expected:
list_of_df=[]
for i in range(0,5):
df=data.copy()
df=a_function(df)
list_of_df.append(df)
But I am not sure why. Thank you for your help!
edit: more information of what a_function is doing on the dataFrame