Python - Memory allocation error for an array

Question

I would like to process a large database of .tdms files (read file, store value, go to next file, then plots/process values). However, my program keep crashing with always the same error :

MemoryError: Unable to allocate 24.4 GiB for an array with shape (399229, 4096) and data type complex128

As I would like to read hundred of files, process them and plot the result I understand this take a lot of memory, but how could I get rid of this please ? I'm guessing there might be a way to "clean" some stuff that python keep in memory, from time to time in the program ?

Below I paste a minimal exemple showing the way I'm reading the data. For exemple if I try to run this loop for just some files it works fine, but as soon as it's more that some hundred files I have this error after some hours and it stop.

print('start')

sourdir='C:/.../raw_data'

tdms_file = glob.glob('*.tdms')

listTdmsFiles = glob.glob(sourdir + '/*.tdms')
nbTdmsFiles = len(listTdmsFiles)

sorted_by_mtime_ascending = sorted(listTdmsFiles, key=lambda t: os.stat(t).st_mtime)

don=np.arange(nbTdmsFiles)

fichii=[]
Vrot=[]
AXMO=[]
T_OFF=[]
T_ON=[]

for fich in don:
    plt.close('all')
    print(fich)
    dT1=[];dT2=[];dT3=[];dT4=[];dT5=[];dT6=[];dT7=[];dT8=[];tdms_file=[];group=[];data=[]
    if os.path.isfile(sorted_by_mtime_ascending[fich]):

        filename = sorted_by_mtime_ascending[fich].replace(sourdir+"\\", "")
        filename = filename.replace(".tdms", "")
        testnumber = filename.replace("essais", "")        

        tdms_file = TdmsFile(sorted_by_mtime_ascending[fich])
        header = tdms_file.groups()[0].as_dataframe()
        data = tdms_file.groups()[1].as_dataframe()
        tdms_file=[]
        header.columns
        data.columns

        Vrot=np.append(Vrot,header['VitesseMoteur'])
        AXMO=np.append(AXMO,header['IncrementAXMO'])
        T_OFF=np.append(T_OFF,header['T_OFF'])
        T_ON=np.append(T_ON,header['T_ON'])

        plt.close('all')

        fichii=np.append(fichii,testnumber)

#%% visualisation des donées        
print("end")

Thanks

score 2 · Answer 1 · answered Feb 16 '22 at 13:00

2

You can try these:

The latest python version has better memory management, so use the latest possible.
Make sure you close all files after use
Use the del keyword to break any connection to the actual data since python wont free up RAM if there is still a link to the data.
Use gc.collect to force garbage collection after using del
Use pickle or another serialiser to dump your data in files if you still need it

answered Feb 16 '22 at 13:00

Rémi Dion-Déry

51
4

So that mean that using "tdms_file = TdmsFile(sorted_by_mtime_ascending[fich])" open the file, and that I need to close it after ? And the del keyword, I have to use it like for exemple on "data", "header" and "tdms_file" ? I don't know gc.collect I guess I put it at the and of each loop ? Pickle I don't know either – chang thenoob Feb 16 '22 at 13:30
In your case I believe you only need to add this at the end of the loop : del header del data gc.collect() – Rémi Dion-Déry Feb 16 '22 at 16:54
I did this, still the issue... :/ I even try to free memory doing tdms_file.close() at the end, no difference – chang thenoob Feb 17 '22 at 16:11
Then your dataframes are still referenced somehow(maybe because of `Vrot`, `T_OFF`, ...). [This seems to be linked to your problem](https://stackoverflow.com/questions/39100971/how-do-i-release-memory-used-by-a-pandas-dataframe). – Rémi Dion-Déry Feb 17 '22 at 16:32
You can try using np.array instead of python arrays. np.array seems to force copy instead of referencing – Rémi Dion-Déry Feb 17 '22 at 16:37

Python - Memory allocation error for an array

1 Answers1