I want to process many mp3 files in a loop using a Jupyter Notebook on Kaggle. Reading the mp3 file as binary does however seem to keep the file in memory, even after the function has returned and the file is properly closed. This causes memory usage to grow with each file processed. The issue seems to be in the read()
function as a pass
does not cause any memory usage growth.
While looping through the mp3 files the memory usage growth is equal to the size of the files being processed, which hints to the files being kept in memory.
How do I read a file without it being kept in memory after the function returns?
def read_mp3_as_bin(fname):
with open(fname, "rb") as f:
data = f.read() # when using 'pass' memory usage doesn't grow
print(f.closed)
return
for fname in file_names: # file_names are 25K paths to the mp3 files
read_mp3_as_bin(fname)
"SOLUTION"
I did run the this code locally and no memory usage growth at all. It therefore looks like Kaggle does handle files differently, as that is the only variable in this test. I will try to find out why this code behaves differently on Kaggle, will let you know when I know more.