I do load large DataFrame in python3 and take a small subset of it.
I would expect Python to remove original large dataframe object from memory including its name, reference and value. Although, this is not happening because memory is not decreasing. Why?
This gives me a huge problem and I do not know how to release the memory.
before 130564096
loading files
just before taking subset 3827941376
7258946128
56
after 3803156480
This is the code:
from pympler.asizeof import asizeof
process = psutil.Process(os.getpid())
basepath = os.getcwd()
print("before", process.memory_info().rss)
def load_files(file_name, file_ext):
filename = "%s.%s" %(file_name, file_ext)
filepath = path.abspath(path.join(basepath, "..", "..", "data_input", filename))
with open(filepath, 'rb') as pickle_load:
df = pickle.load(pickle_load)
print("just before taking subset", process.memory_info().rss)
print(asizeof(df))
df2 = df[:100].copy(deep=True)
del df
gc.collect()
df = pd.DataFrame()
df = ''
gc.collect()
print(asizeof(df))
print("after", process.memory_info().rss)
exit()
return df2