As noted in this question is possible to explicitly release the memory of a dataframe. I am running into an issue which is a bit of an extension to that problem. I often import a whole data set and do a selection on it. The selections tend to come in two forms:
df_row_slice = df.sample(frac=0.6)
df_column_slice = df[columns]
Past some point in my code I know that I will no longer make any reference to the original df. Is there a way to release all the memory which is not referenced by the slices? I realize I could .copy() when I slice but this temporary duplication would cause me to exceed my memory.
UPDATE
Following the reply I think the method would be to drop the columns or rows from the original frame.
df_column_slice = df[columns]
cols_to_drop = [i for i in df.columns if i not in columns]
df = df.drop(columns=cols_to_drop)
or
df_row_slice = df.sample(frac=0.6)
df = df.drop(df_row_slice.index)
Hopefully the garbage collection then works properly to free up the memory. Would it be smart to call
import gc
gc.collect()
just to be safe? Does the order matter? I could drop before the slicing without problem. In my specific case, I make several slices of both types. My hope would be that I could del df and memory management would do something like this under the hood.