We have a massive pandas dataframe in our code - shape is (102730344, 50). In order to free up memory, we put in a del of this dataframe once it's no longer needed. That del statement is taking 4 hours to run currently on powerful hardware. Is there a way to speed this up?
Here's the code flow:
big_data_df, small_df, medium_data, smaller_df = get_data(params)
#commented out code
del big_data_df # this takes 4 hours
So we call a function that returns 4 dataframes, one of which is the big dataframe we want to later delete. We've commented out the code between getting the dataframe and deleting it when no longer needed for testing. The del then runs, and a logging statement following that execution shows a runtime of 4 hours.