Pandas - inplace, view, copy confusion

Question

I'm having an issue with Pandas dataframes. It seems that Pandas/Python generate a copy of the DF somewhere in my code as opposed to performing the modifications to the original DF.

In the code below, "update_df" still sees the DF with a "file_exists" column, which should have been removed by the previous function.

MAIN:

if __name__ == '__main__':
    df_main = load_df()
    clean_df2(df_main)
    update_df(df_main, image_path_main)
    .....

clean_df2

def clean_df2(df): #remove non-existing files from DF
    df['file_exists'] = True # add column, set all to True?
    .....
    df = df[df['file_exists'] != False] #Keep only records that exist
    df.drop('file_exists', 1, inplace=True)  # delete the temporary column
    df.reset_index(drop=True, inplace = True)  # reindex if source has gaps

update_df:

def update_df(df, image_path): #add DF rows for files not yet in DF
    print(df)
    ....

Allen Qin · Answer 1 · 2017-05-08T20:34:56.170

1

I think when you do:

df = df[df['file_exists'] != False]

You've created a copy of the original df.

To make it work, you can change your function to:

def clean_df2(df): #remove non-existing files from DF
    df['file_exists'] = True # add column, set all to True?
    .....
    return df

And when you call clean_df2(df), do the following:

df = clean_df2(df)

edited May 08 '17 at 20:34

answered May 08 '17 at 20:27

Allen Qin

19,507
8
51
67

Better yet, change the line to `df.drop(df['file_exits'] != False, inplace=True)` – juanpa.arrivillaga May 08 '17 at 21:03
That's another option. A copy can be easily made inside a function. Returning the final df is probably a safer option. – Allen Qin May 08 '17 at 21:16
I suppose, but oftentimes you want to avoid copying dataframes for memory reasons. – juanpa.arrivillaga May 08 '17 at 21:17

Pandas - inplace, view, copy confusion

1 Answers1

Linked