-1

I've written a script that reads many types of files and writes file-by-file to an Excel sheet. This happens in a for loop. The problem is that when I finish writing the data to disk, the allocated memory is still occupied though I'm using del and gc.collect(). I've used the memory profiler, and it was clear that the memory is not being freed up when the variable data is deleted. What is the reason?

These are some parts of the memory profile output of two files:

First file:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   167    121.7 MiB    121.7 MiB           1   @profile
   168                                         def read_file(f_path: str):
...
   178    121.7 MiB      0.0 MiB           1       file_output = ""
...
   202    121.7 MiB      0.0 MiB           1           elif ext == ".pdf":
   203    330.7 MiB    209.0 MiB           1               file_output = read_pdf(f_path)
...                                      
   212    330.7 MiB      0.0 MiB           1       return file_output

Second file:

   202    331.7 MiB      0.0 MiB           1           elif ext == ".pdf":
   203    478.4 MiB    146.7 MiB           1               file_output = read_pdf(f_path)

gc.collect:

   147    478.4 MiB      0.0 MiB           2           output_csv_path = os.path.join(output_path, f"data_csv_temp{p}.csv")
   148    478.4 MiB      0.1 MiB           2           df.to_csv(output_csv_path, index=False)
   149    478.4 MiB      0.0 MiB           2           del file_output
   150    478.4 MiB      0.0 MiB           2           del df
   151    478.4 MiB      0.0 MiB           2           gc.collect()

Thanks in advance!

Esraa Abdelmaksoud
  • 1,307
  • 12
  • 25
  • 2
    del doesn't necessarily release the memory. Eg if you run `a = [1,2,3]; b=a; del a` then `b == [1,2,3]`. Its hard to say whats going on without seeing the rest of your code. You might be holding on to a reference to that memory somewhere else. – Loocid Aug 25 '23 at 04:04
  • Check refcount of dataframes you're trying to garbage collect. `import sys; sys.getrefcount(df)`. If refcount is greater than one then the object will not be garbage collected. – NotAName Aug 25 '23 at 05:08
  • Garbage collector is using heuristic algorithms under the hood and also varies between distributions. It's just not guaranteed that removing all references will cause realeasing of the memory. If explicit memory management is required python is a wrong choice of the tool. Maybe consider implementation in C++? If memory is a problem consider using generators and process files line by line. You might want to take a look at: https://stackoverflow.com/q/1316767/15923186 – Gameplay Aug 25 '23 at 06:19
  • Thanks a lot to all of you. I found that it was referenced somewhere else when I used ```sys.getrefcount()``` as recommended. – Esraa Abdelmaksoud Aug 25 '23 at 21:25

0 Answers0