I have a script that loops through ~335k filenames, opens the fits-tables from the filenames, performs a few operations on the tables and writes the results to a file. In the beginning the loop goes relatively fast but with time it consumes more and more RAM (and CPU resources, I guess) and the script also gets slower. I would like to know how can I improve the performance/make the code quicker. E.g. is there a better way to write to the output file (open the output file ones and do everything within a while-open loop vs opening the file every time a new to write in it)? Is there a better looping way? Can I dump memory that I don't need anymore?
My script looks like that:
#spectral is a package for manipulation of spectral data
from spectral import *
# I use this dictionary to store functions, that I don't want to generate a new each time I need them.
# Generating them a new would be more time consuming, I figured out
lam_resample_dic = {}
with open("/home/bla/Downloads/output.txt", "ab") as f:
for fname, ind in zip(list_of_fnames, range(len(list_of_fnames))):
data_s = Table.read('/home/nestor/Downloads/all_eBoss_QSO/'+fname, format='fits')
# lam_str_identifier is just the dic-key I need for finding the corresponding BandResampler function from below
lam_str_identifier = ''.join([str(x) for x in data_s['LOGLAM'].data.astype(str)])
if lam_str_identifier not in lam_resample_dic:
# BandResampler is the function I avoid doing everytime a new
# I do it only if necessary - when lam_str_identifier indicates a unique new set of data
resample = BandResampler(centers1=10**data_s['LOGLAM'], centers2=df_jpas["Filter.wavelength"].values, fwhm2=df_jpas["Filter.width"].values)
lam_resample_dic[lam_str_identifier] = resample
photo_spec = np.around(resample(data_s['FLUX']),4)
else:
photo_spec = np.around(lam_resample_dic[lam_str_identifier](data_s['FLUX']),4)
np.savetxt(f, [photo_spec], delimiter=',', fmt='%1.4f')
# this is just to keep track of the progress of the loop
if ind%1000==0:
print('num of files processed so far:',find)
Thanks for any suggestions!