1

I am training an U-net on TensorFlow 2. When I load the model, it takes up almost all the memory of the GPU (22 GB out 26 GB), though my model is supposed to take up at best 1.5 GB of memory with 190 million parameters. To understand the problem, I tried to load a model that didn't have any layers, and to my surprise it was still taking up the same amount of memory. The code for my model is attached below:

x = tf.keras.layers.Input(shape=(256,256,1))

model = Sequential(
    [
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Activation('relu')(Add()([conv5_0, conv5_2])),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(2048, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(2048, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(2048, 3, padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(1024, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(256, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'), 
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        Conv2D(1, 3, activation = 'linear', padding = 'same', kernel_initializer = 'he_normal')
    ])

y = model(x)

I commented out all the layers and it was still taking up 22 GB. I am using jupyter-notebook to run the code. I thought adding tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=x) in the beginning of my jupyter notebook would solve the problem but it did not. My goal is to run multiple scripts simultaneously on the GPU to make more efficient use of my time. Any help would be much appreciated. Thank you.

NB: Just noticed that it doesn't only happen for this code, but any other Tensorflow module. For example, at some point of my code, I used tf.signal.ifft2 before loading the model and it also took up almost the same memory as the model. How to get around this problem?

shaurov2253
  • 183
  • 1
  • 12
  • 1
    https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory – mibrahimy Dec 01 '20 at 09:19

3 Answers3

2

You can allocate memory dynamically like this:

from keras.backend.tensorflow_backend import set_session

config=tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
set_session(sess)
xierui
  • 1,047
  • 1
  • 9
  • 22
2

You will need to limit the GPU memory growth, you can find a sample code on TensorFlow page

I copied the snippet code as well:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only use the first GPU
try:
    f.config.experimental.set_visible_devices(gpus[0], 'GPU')
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
except RuntimeError as e:
    # Visible devices must be set before GPUs have been initialized
    print(e)

I've encountered the same problem in some of my projects and I noticed that I the batch size if it is large then there are issues with the GPU memory. Try to set your batch size as small as possible. I start with a batch size of 1 when the model is complex.

Ioannis
  • 21
  • 4
1

Further discuss can be found at https://www.tensorflow.org/guide/gpu , You should read it .

DachuanZhao
  • 1,181
  • 3
  • 15
  • 34