tensorflow - GPU reaches maximum RAM capacity and get Out Of Memory (OOM)

Question

I run my Bidirectional GRU encoder-decoder with attention using Tesla V100 GPU. My training data are 800 with variative token inside the data. It can be from 771 to 25,672 tokens. When I run it, my memory consumption reaches maximum and then it crashed because of OOM. Then I try to run smaller data (10 data) to the same model, and it run succesfully, but it also consume maximum RAM. I already tried run that 10 data with same code in Tesla K80 12GB (Google Colaboratory) and it only consume about 3GB. I already use allow_growth but it doesnt fix the error.

Here are the details of software and hardware I have:

Tesla V100 PCIE 16GB
Ubuntu 18.04
NVIDIA-SMI 418.67
CUDA 10.0
CuDNN 7.4.2
Python 3.6
Tensorflow 1.13.1
Keras 2.2.4

Here are hyperparameters of my model:

embedding size = 100
hidden unit = 512
alignment unit = 512
max_len_summary = 450
learning rate = 0.01

This is my code of the model

graph = tf.Graph()
with graph.as_default():
    x = tf.placeholder(tf.float32, shape=[1, None, embedding_size])
    y_label = tf.placeholder(tf.float32, shape=[1,max_summary_len, embedding_size])
    initial_input = tf.placeholder(tf.float32, shape=(embedding_size,1))

    Wa = tf.Variable(tf.random_normal([hidden_unit, hidden_unit],seed=seed, mean=mean, stddev=0.001))
    v_a = tf.transpose(tf.zeros([alignment_unit,1],dtype=tf.float32))
    Ua = tf.Variable(tf.random_normal([alignment_unit, 2*hidden_unit],seed=seed, mean=mean, stddev=0.001))

    Wu = tf.Variable(tf.random_normal([hidden_unit, embedding_size],seed=seed, mean=mean, stddev=stddev))
    Uu = tf.Variable(tf.random_normal([hidden_unit, hidden_unit],seed=seed, mean=mean, stddev=stddev))
    Cu = tf.Variable(tf.random_normal([hidden_unit, 2*hidden_unit],seed=seed, mean=mean, stddev=stddev))

    Wr = tf.Variable(tf.random_normal([hidden_unit, embedding_size],seed=seed, mean=mean, stddev=stddev))
    Ur = tf.Variable(tf.random_normal([hidden_unit, hidden_unit],seed=seed, mean=mean, stddev=stddev))
    Cr = tf.Variable(tf.random_normal([hidden_unit, 2*hidden_unit],seed=seed, mean=mean, stddev=stddev))

    W = tf.Variable(tf.random_normal([hidden_unit, embedding_size],seed=seed, mean=mean, stddev=stddev))
    U = tf.Variable(tf.random_normal([hidden_unit, hidden_unit],seed=seed, mean=mean, stddev=stddev))
    C = tf.Variable(tf.random_normal([hidden_unit, 2*hidden_unit],seed=seed, mean=mean, stddev=stddev))

    Ww_o = tf.Variable(tf.random_normal([embedding_size, embedding_size],seed=seed, mean=mean, stddev=stddev))
    Wc_o = tf.Variable(tf.random_normal([embedding_size, 2*hidden_unit],seed=seed, mean=mean, stddev=stddev))
    Ws_o = tf.Variable(tf.random_normal([embedding_size, hidden_unit],seed=seed, mean=mean, stddev=stddev))

    Wo = tf.Variable(tf.random_normal([embedding_size, 1],seed=seed, mean=mean, stddev=stddev))
    b_o = tf.zeros([embedding_size, 1])

    # define model
    """__encoder___"""
    encoder_LSTM = tf.keras.layers.CuDNNGRU(hidden_unit,return_sequences=True,return_state=True)
    encoder_LSTM_rev=tf.keras.layers.CuDNNGRU(hidden_unit,return_state=True,return_sequences=True,go_backwards=True)

    encoder_outputs, state_h = encoder_LSTM(x)
    encoder_outputsR, state_hR = encoder_LSTM_rev(x)

    state_hfinal=Add()([state_h,state_hR])
    encoder_outputs_final = tf.concat([encoder_outputs,encoder_outputsR], axis=2)

    """__decoder___"""
    initial_state = tf.zeros([hidden_unit,1], dtype=tf.float32)

    arr = tf.reshape(initial_input, (1, embedding_size))
    def decoderStep(arr, last_output,last_state, step):
        prev_state_rep = tf.tile(last_state, (1, tf.shape(encoder_outputs_final)[1])) #something wrong #nilai state diulang sesuai panjang input

        e = tf.matmul(v_a, tf.tanh(tf.add(tf.matmul(Wa,prev_state_rep), tf.matmul(Ua, tf.reshape(encoder_outputs_final,[2*hidden_unit,-1])))))

        pembilang = tf.math.exp(e)
        penyebut = tf.reduce_sum(pembilang, axis=1)
        penyebut = tf.reshape(penyebut, [1,1])
        penyebut = tf.tile(penyebut, (1, tf.shape(e)[1])) #nilai penyebut diulang sesuai panjang input
        alphas = pembilang/penyebut

        c = tf.reduce_sum(alphas*tf.reshape(encoder_outputs_final,[2*hidden_unit, -1]),axis=1)
        c = tf.expand_dims(c,1)

        u = tf.nn.sigmoid(tf.matmul(Wu, last_output) + tf.matmul(Uu, last_state) + tf.matmul(Cu, c))
        r = tf.nn.sigmoid(tf.matmul(Wr, last_output) + tf.matmul(Ur, last_state) + tf.matmul(Cr, c))
        s_ = tf.nn.tanh(tf.matmul(W, last_output) + tf.multiply(tf.matmul(U, r), tf.matmul(U, last_state)) + tf.matmul(C,c))
        s = tf.multiply(1-u, last_state) + tf.multiply(u, s_)

        o = tf.matmul(Ww_o, last_output) + tf.matmul(Wc_o, c) + tf.matmul(Ws_o, s)
        y_pred = tf.multiply(Wo, o) + b_o

        arr = tf.concat([arr,tf.reshape(y_pred, (1, embedding_size))], 0)

        return [arr, y_pred, s, tf.add(step, 1)]

    i = tf.constant(0)
    each_step = lambda c, a, b, step: tf.less(step, max_summary_len-1)
    predss,ltm_pred, lstm_state, _ = tf.while_loop(each_step, decoderStep, [arr, initial_input, initial_state, i] ,shape_invariants=[tf.TensorShape([None, embedding_size]),initial_input.get_shape(),initial_state.get_shape(),i.get_shape()])
    preds=tf.expand_dims(predss,0)

    loss = tf.losses.mean_squared_error(y_label, preds)
    #   loss=tf.reduce_mean(-tf.reduce_sum(y_label * tf.log(tf.math.abs(preds)), axis=[0]))
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    saver = tf.train.Saver()
    init = tf.global_variables_initializer()

Here is the error message:

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1024,8370] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node while/mul (defined at ta_skenario1.py:226) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[node mean_squared_error/value (defined at ta_skenario1.py:246) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

This is the result of nvidia-smi:

enter image description here

Any help? Thank you

UPDATE: I just realize that my code always gets OOM in Tensorflow (decoder) code part. Its never get OOM in Keras CuDNNGRU. Any suggestion to change my code to be simpler? Thanks

UPDATE: I changed my embedding size to 64, hidden unit to 128, alignment unit 64. The OOM error is gone, but it takes very long time (approx. 13 mins) for 1 epoch

I had a problem like this. try installing an older version of tensorflow. — Dadu Khan, May 29 '19 at 14:41
When you say you have used `allow_growth`, does that mean you have tried setting it to `False`? If `allow_growth` is `True`, it allows TensorFlow to use the entire GPU's memory. — Luke DeLuccia, May 29 '19 at 15:52
@DaduKhan I tried to downgrade my tensorflow to 1.11 but it doesnt compatible with my CUDA version — Rike Adelia, May 30 '19 at 14:00
@LukeDeLuccia I changed the `allow_growth` option to `False` and the problem still persists. — Rike Adelia, May 30 '19 at 14:01
How are you setting `allow_growth`? I was wrong, `allow_growth` must be `True` like you originally mentioned. Are doing it similarly to the answer here https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory#answer-37454574? You can also limit the amount of memory with `per_process_gpu_memory_fraction`. — Luke DeLuccia, May 30 '19 at 21:51
@LukeDeLuccia I set it to `True`. I also add that parameter with value 0.866 but it doesnt takes any effect :/ — Rike Adelia, May 31 '19 at 16:13
I think your while loop has a problem. In `each_step = lambda c, a, b, step: tf.less(step, max_summary_len-1)` where is the value of `step` coming from? — dgumo, Jun 01 '19 at 19:49
@dgumo it is coming from this `def decoderStep(arr, last_output,last_state, step):` — Rike Adelia, Jun 01 '19 at 19:52

tensorflow - GPU reaches maximum RAM capacity and get Out Of Memory (OOM)

0 Answers0