0

I would like to process a large database of .tdms files (read file, store value, go to next file, then plots/process values). However, my program keep crashing with always the same error :

MemoryError: Unable to allocate 24.4 GiB for an array with shape (399229, 4096) and data type complex128

As I would like to read hundred of files, process them and plot the result I understand this take a lot of memory, but how could I get rid of this please ? I'm guessing there might be a way to "clean" some stuff that python keep in memory, from time to time in the program ?

Below I paste a minimal exemple showing the way I'm reading the data. For exemple if I try to run this loop for just some files it works fine, but as soon as it's more that some hundred files I have this error after some hours and it stop.

print('start')

sourdir='C:/.../raw_data'

tdms_file = glob.glob('*.tdms')

listTdmsFiles = glob.glob(sourdir + '/*.tdms')
nbTdmsFiles = len(listTdmsFiles)

sorted_by_mtime_ascending = sorted(listTdmsFiles, key=lambda t: os.stat(t).st_mtime)

don=np.arange(nbTdmsFiles)

fichii=[]
Vrot=[]
AXMO=[]
T_OFF=[]
T_ON=[]

for fich in don:
    plt.close('all')
    print(fich)
    dT1=[];dT2=[];dT3=[];dT4=[];dT5=[];dT6=[];dT7=[];dT8=[];tdms_file=[];group=[];data=[]
    if os.path.isfile(sorted_by_mtime_ascending[fich]):

        filename = sorted_by_mtime_ascending[fich].replace(sourdir+"\\", "")
        filename = filename.replace(".tdms", "")
        testnumber = filename.replace("essais", "")        

        tdms_file = TdmsFile(sorted_by_mtime_ascending[fich])
        header = tdms_file.groups()[0].as_dataframe()
        data = tdms_file.groups()[1].as_dataframe()
        tdms_file=[]
        header.columns
        data.columns

        Vrot=np.append(Vrot,header['VitesseMoteur'])
        AXMO=np.append(AXMO,header['IncrementAXMO'])
        T_OFF=np.append(T_OFF,header['T_OFF'])
        T_ON=np.append(T_ON,header['T_ON'])

        plt.close('all')

        fichii=np.append(fichii,testnumber)

#%% visualisation des donées        
print("end")

Thanks

1 Answers1

2

You can try these:

  • The latest python version has better memory management, so use the latest possible.
  • Make sure you close all files after use
  • Use the del keyword to break any connection to the actual data since python wont free up RAM if there is still a link to the data.
  • Use gc.collect to force garbage collection after using del
  • Use pickle or another serialiser to dump your data in files if you still need it
  • So that mean that using "tdms_file = TdmsFile(sorted_by_mtime_ascending[fich])" open the file, and that I need to close it after ? And the del keyword, I have to use it like for exemple on "data", "header" and "tdms_file" ? I don't know gc.collect I guess I put it at the and of each loop ? Pickle I don't know either – chang thenoob Feb 16 '22 at 13:30
  • In your case I believe you only need to add this at the end of the loop : del header del data gc.collect() – Rémi Dion-Déry Feb 16 '22 at 16:54
  • I did this, still the issue... :/ I even try to free memory doing tdms_file.close() at the end, no difference – chang thenoob Feb 17 '22 at 16:11
  • Then your dataframes are still referenced somehow(maybe because of `Vrot`, `T_OFF`, ...). [This seems to be linked to your problem](https://stackoverflow.com/questions/39100971/how-do-i-release-memory-used-by-a-pandas-dataframe). – Rémi Dion-Déry Feb 17 '22 at 16:32
  • You can try using np.array instead of python arrays. np.array seems to force copy instead of referencing – Rémi Dion-Déry Feb 17 '22 at 16:37