I'm running a genetic hyperparameter search algorithm and it quickly saturates all available memory.
After a few tests it looks like the amount of memory required by keras increases both between different epochs and when training different models. The problem becomes a lot worse as the minibatch size increases, a minibatch size of 1~5 at least gives me enough time to see the memory usage rise up really fast in the first few fits and then slowly but steadily keep increasing over time.
I already checked keras predict memory swap increase indefinitely, Keras: Out of memory when doing hyper parameter grid search, and Keras (TensorFlow, CPU): Training Sequential models in loop eats memory, so I am already clearing keras session and resetting tensorflow's graph after each iteration.
I also tried explicitly deleting the model and history object and running gc.collect() but to no avail.
Im running Keras 2.2.4, tensorflow 1.12.0, Python 3.7.0 on CPU. The code I'm running for each gene and the callback I'm using to measure the memory usage:
import tensorflow as tf
import keras as K
class MemoryCallback(K.callbacks.Callback):
def on_epoch_end(self, epoch, log={}):
print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
def Rateme(self,loss,classnum,patience,epochs,DWIshape,Mapshape,lr,TRAINDATA,TESTDATA,TrueTrain, TrueTest,ModelBuilder,maxthreads):
K.backend.set_session(K.backend.tf.Session(config=K.backend.tf.ConfigProto(intra_op_parallelism_threads=maxthreads, inter_op_parallelism_threads=maxthreads)))
#Early Stopping
STOP=K.callbacks.EarlyStopping(monitor='val_acc', min_delta=0.001,
patience=patience, verbose=0, mode='max')
#Build model
Model=ModelBuilder(DWIshape, Mapshape, dropout=self.Dropout,
regularization=self.Regularization,
activ='relu', DWIconv=self.nDWI, DWIsize=self.sDWI,
classes=classnum, layers=self.nCNN,
filtersize=self.sCNN,
FClayers=self.FCL, last=self.Last)
#Compile
Model.compile(optimizer=K.optimizers.Adam(lr,decay=self.Decay), loss=loss, metrics=['accuracy'])
#Fit
his=Model.fit(x=TRAINDATA,y=TrueTrain,epochs=epochs,batch_size=5, shuffle=True, validation_data=(TESTDATA,TrueTest), verbose=0, callbacks=[STOP, MemoryCallback()]) #check verbose and callbacks
#Extract
S=Model.evaluate(x=TESTDATA, y=TrueTest,verbose=1)[1]
del his
del Model
del rateme
K.backend.clear_session()
tf.reset_default_graph()
gc.collect()
return S