I have a piece of code which receives call-back from another function and creates a list of list (pd_arr). This list is then used to create a data frame. Finally the list of list is deleted.
On profiling using memory-profiler, this is the output
102.632812 MiB 0.000000 MiB init()
236.765625 MiB 134.132812 MiB add_to_list()
return pd.DataFrame()
394.328125 MiB 157.562500 MiB pd_df = pd.DataFrame(pd_arr, columns=df_columns)
350.121094 MiB -44.207031 MiB pd_df = pd_df.set_index(df_columns[0])
350.292969 MiB 0.171875 MiB pd_df.memory_usage()
350.328125 MiB 0.035156 MiB print sys.getsizeof(pd_arr), sys.getsizeof(pd_arr[0]), sys.getsizeof(pd_df), len(pd_arr)
350.328125 MiB 0.000000 MiB del pd_arr
On checking deep memory usage of pd_df (data frame), it is 80.5 MB. So, my question is why does the memory not decrement after del pd_arr
line.
Also, total data frame size as per profiler (157 - 44 = 110 MB) seems to be more than 80 MB. So, what causes the difference?
Also, is there any other memory-efficient way to create data frame (data received in loop) which is not too bad in time performance (For eg: increment of 10s of seconds should be fine for data-frame of size 100MB)?
Edit: Simple python script which explains this behaviour
Filename: py_test.py
Line # Mem usage Increment Line Contents
================================================
9 102.0 MiB 0.0 MiB @profile
10 def setup():
11 global arr, size
12 102.0 MiB 0.0 MiB arr = range(1, size)
13 131.2 MiB 29.1 MiB arr = [x+1 for x in arr]
Filename: py_test.py
Line # Mem usage Increment Line Contents
================================================
21 131.2 MiB 0.0 MiB @profile
22 def tearDown():
23 global arr
24 131.2 MiB 0.0 MiB del arr[:]
25 131.2 MiB 0.0 MiB del arr
26 93.7 MiB -37.4 MiB gc.collect()
On introducing dataframe,
Filename: py_test.py
Line # Mem usage Increment Line Contents
================================================
9 102.0 MiB 0.0 MiB @profile
10 def setup():
11 global arr, size
12 102.0 MiB 0.0 MiB arr = range(1, size)
13 132.7 MiB 30.7 MiB arr = [x+1 for x in arr]
Filename: py_test.py
Line # Mem usage Increment Line Contents
================================================
15 132.7 MiB 0.0 MiB @profile
16 def dfCreate():
17 global arr
18 147.1 MiB 14.4 MiB pd_df = pd.DataFrame(arr)
19 147.1 MiB 0.0 MiB return pd_df
Filename: py_test.py
Line # Mem usage Increment Line Contents
================================================
21 147.1 MiB 0.0 MiB @profile
22 def tearDown():
23 global arr
24 #del arr[:]
25 147.1 MiB 0.0 MiB del arr
26 147.1 MiB 0.0 MiB gc.collect()