I am trying to incriment the per_process_gpu_memory_fraction
value in my tf.GPUOptions()
and then change the Keras session with set_session()
however, the memory fraction never actually changes. After the first run of the while loop, 319MB is reserved as shown in nvidia-smi
, which
a) never gets released when clear_session()
is called, and
b) doesn't go up on the next iteration of the while loop.
import GPUtil
import time
import tensorflow as tf
import numpy as np
from keras.backend.tensorflow_backend import set_session, clear_session, get_session
from tensorflow.python.framework.errors_impl import ResourceExhaustedError, UnknownError
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical
def model_trainer():
y_pred = None
errors = 0
total_ram = GPUtil.getGPUs()[0].memoryTotal
total_ram_allowed = GPUtil.getGPUs()[0].memoryTotal * 0.90
mem_amount = 0.005 # intentionally allocated a small amount so it needs to
# increment the mem_amount
x_train = np.empty((10000, 100))
y_train = np.random.randint(0, 9, size=10000)
y_train = to_categorical(y_train, 10)
while y_pred is None:
print("mem", mem_amount)
if total_ram_allowed > total_ram * mem_amount and GPUtil.getGPUs()[0].memoryFree > total_ram * mem_amount:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=mem_amount)
config = tf.ConfigProto(
intra_op_parallelism_threads=2,
inter_op_parallelism_threads=2,
gpu_options=gpu_options)
sess = tf.Session(config=config)
set_session(sess)
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
try:
print(sess)
model.fit(x_train, y_train, epochs=5, batch_size=32)
y_pred = model.predict(x_train)
except (ResourceExhaustedError, UnknownError) as e:
if mem_amount > 1.0:
raise ValueError('model too large for vram')
else:
mem_amount += 0.05
clear_session()
errors += 1
pass
else:
clear_session()
if __name__ == "__main__":
model_trainer()
The puzzling thing is that Keras willingly takes the new session (as shown by a get_session()
call), but won't apply the new GPUOptions
.
In addition to the example above I have tried doing:
clear_session()
del model
clear_session()
del model
gc.collect()
None of this has worked in releasing the VRAM.
My overall goal is to use "trial and error" until the process has enough VRAM to train on, as there seems to be no good way of figuring out how much VRAM is needed for a Keras model without just running it, so that I can run multiple models in parallel on a single GPU. When the ResourceExhaustedError
occurs, I want to release the VRAM that is held by Keras and then try again with a larger amount of VRAM. Is there any way to accomplish this?