3

I'm training an LSTM on a couple GB dataset using the keras API, tensorflow backend. When running Model.fit() on some in-memory data (numpy), it allocates 8GB of memory in one request, which doesn't happen when loading only a small subset of the data. My GPU can't take both the model parameters and that 8GB, it goes out of memory and stops. I'm pretty sure this started happening after I upgraded to TF2rc from TF2 beta. Here's how I call fit:

tb = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
es = keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=patience*2, restore_best_weights=True)
lr_reduce = keras.callbacks.ReduceLROnPlateau(factor=0.1, patience=patience, verbose=1)
chkpointing = keras.callbacks.ModelCheckpoint(weight_fname, monitor='val_loss', verbose=0, save_best_only=True,
                                              save_weights_only=True, mode='auto')

model.fit(train_data_x, train_data_y, validation_data=(test_data_x, test_data_y), batch_size=cfg['batch_size'],
                  epochs=nepochs, validation_freq=1, callbacks=[lr_reduce, es, tb, chkpointing],
                  class_weight=cfg['class_weight'], shuffle=True)

Is allocating space for the whole dataset on the GPU is intended? How can I prevent it from happening?

EDIT:

Updated the code to limit memory allocation. It does limit it, as it shows that TF has access to less memory than before, but it still attempts to allocate that 8.14GB. Here's how I limit the memory and select the GPU:

def select_gpu(gpu_id=-1, max_usage=.5):  # max 2 gpu only
    os.environ["CUDA_VISIBLE_DEVICES"] = str(gpu_id) if gpu_id != -1 else '0,1'
    gpus = tf.config.experimental.list_physical_devices('GPU')
    max_memory = 11534  # MB got from: grep -i --color memory /var/log/Xorg.0.log
    for gpu in gpus:
        print('GPU FOUND:', gpu)
        tf.config.experimental.set_memory_growth(gpu, True)  # FIXME true
        tf.config.experimental.set_virtual_device_configuration(gpu,
            [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=max_memory * max_usage)])
    print('RUNNING ON GPU #{}'.format(gpu_id))

# ... just call select_gpu(0) in the beginning of the script

Here's the error:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
time_distributed (TimeDistri (None, 42, 256)           7168      
_________________________________________________________________
cu_dnnlstm (CuDNNLSTM)       (None, 42, 256)           526336    
_________________________________________________________________
cu_dnnlstm_1 (CuDNNLSTM)     (None, 42, 256)           526336    
_________________________________________________________________
cu_dnnlstm_2 (CuDNNLSTM)     (None, 42, 256)           526336    
_________________________________________________________________
cu_dnnlstm_3 (CuDNNLSTM)     (None, 42, 256)           526336    
_________________________________________________________________
cu_dnnlstm_4 (CuDNNLSTM)     (None, 42, 256)           526336    
_________________________________________________________________
cu_dnnlstm_5 (CuDNNLSTM)     (None, 256)               526336    
_________________________________________________________________
dense_1 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 257       
=================================================================
Total params: 3,231,233
Trainable params: 3,231,233
Non-trainable params: 0
_________________________________________________________________
None
2019-10-27 12:36:48.833843: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 8.14GiB (rounded to 8738821888).  Current allocation summary follows.
2019-10-27 12:36:48.833927: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256):   Total Chunks: 16, Chunks in use: 15. 4.0KiB allocated for chunks. 3.8KiB in use in bin. 72B client-requested in use in bin.
2019-10-27 12:36:48.833944: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.833958: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024):  Total Chunks: 5, Chunks in use: 4. 5.5KiB allocated for chunks. 4.2KiB in use in bin. 4.0KiB client-requested in use in bin.
2019-10-27 12:36:48.833970: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.833982: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096):  Total Chunks: 1, Chunks in use: 0. 4.8KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.833998: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192):  Total Chunks: 6, Chunks in use: 6. 49.8KiB allocated for chunks. 49.8KiB in use in bin. 48.0KiB client-requested in use in bin.
2019-10-27 12:36:48.834012: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384):     Total Chunks: 1, Chunks in use: 1. 27.0KiB allocated for chunks. 27.0KiB in use in bin. 27.0KiB client-requested in use in bin.
2019-10-27 12:36:48.834023: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834034: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834045: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834060: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144):    Total Chunks: 1, Chunks in use: 1. 504.0KiB allocated for chunks. 504.0KiB in use in bin. 256.0KiB client-requested in use in bin.
2019-10-27 12:36:48.834073: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288):    Total Chunks: 1, Chunks in use: 0. 512.0KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834088: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576):   Total Chunks: 12, Chunks in use: 12. 12.00MiB allocated for chunks. 12.00MiB in use in bin. 12.00MiB client-requested in use in bin.
2019-10-27 12:36:48.834099: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834110: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834122: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834132: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834143: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834156: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834167: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834180: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456):     Total Chunks: 1, Chunks in use: 0. 4.49GiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-27 12:36:48.834193: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 8.14GiB was 256.00MiB, Chunk State: 
2019-10-27 12:36:48.834213: I tensorflow/core/common_runtime/bfc_allocator.cc:891]   Size: 4.49GiB | Requested Size: 1.00MiB | in_use: 0 | bin_num: 20, prev:   Size: 1.00MiB | Requested Size: 1.00MiB | in_use: 1 | bin_num: -1
2019-10-27 12:36:48.834223: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 4837081088
2019-10-27 12:36:48.834237: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6000000 next 1 of size 256
2019-10-27 12:36:48.834247: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6000100 next 2 of size 256
2019-10-27 12:36:48.834257: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6000200 next 3 of size 1280
2019-10-27 12:36:48.834267: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6000700 next 4 of size 256
2019-10-27 12:36:48.834277: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6000800 next 5 of size 1024
2019-10-27 12:36:48.834287: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6000c00 next 8 of size 256
2019-10-27 12:36:48.834296: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6000d00 next 9 of size 256
2019-10-27 12:36:48.834306: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6000e00 next 10 of size 256
2019-10-27 12:36:48.834316: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6000f00 next 13 of size 256
2019-10-27 12:36:48.834325: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6001000 next 34 of size 256
2019-10-27 12:36:48.834335: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6001100 next 35 of size 256
2019-10-27 12:36:48.834344: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6001200 next 37 of size 256
2019-10-27 12:36:48.834354: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6001300 next 16 of size 256
2019-10-27 12:36:48.834363: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6001400 next 14 of size 256
2019-10-27 12:36:48.834373: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free  at 0x7f3cf6001500 next 40 of size 1280
2019-10-27 12:36:48.834382: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6001a00 next 41 of size 1024
2019-10-27 12:36:48.834392: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free  at 0x7f3cf6001e00 next 18 of size 4864
2019-10-27 12:36:48.834402: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6003100 next 19 of size 8192
2019-10-27 12:36:48.834411: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6005100 next 36 of size 1024
2019-10-27 12:36:48.834420: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6005500 next 39 of size 256
2019-10-27 12:36:48.834430: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6005600 next 42 of size 256
2019-10-27 12:36:48.834439: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6005700 next 43 of size 256
2019-10-27 12:36:48.834449: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free  at 0x7f3cf6005800 next 21 of size 256
2019-10-27 12:36:48.834459: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6005900 next 22 of size 8192
2019-10-27 12:36:48.834469: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6007900 next 25 of size 8192
2019-10-27 12:36:48.834478: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6009900 next 28 of size 8192
2019-10-27 12:36:48.834488: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf600b900 next 6 of size 9984
2019-10-27 12:36:48.834500: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf600e000 next 7 of size 27648
2019-10-27 12:36:48.834509: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6014c00 next 33 of size 8192
2019-10-27 12:36:48.834519: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free  at 0x7f3cf6016c00 next 38 of size 524288
2019-10-27 12:36:48.834528: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6096c00 next 17 of size 516096
2019-10-27 12:36:48.834538: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6114c00 next 12 of size 1048576
2019-10-27 12:36:48.834548: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6214c00 next 11 of size 1048576
2019-10-27 12:36:48.834558: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6314c00 next 20 of size 1048576
2019-10-27 12:36:48.834567: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6414c00 next 15 of size 1048576
2019-10-27 12:36:48.834577: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6514c00 next 24 of size 1048576
2019-10-27 12:36:48.834586: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6614c00 next 23 of size 1048576
2019-10-27 12:36:48.834595: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6714c00 next 27 of size 1048576
2019-10-27 12:36:48.834605: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6814c00 next 26 of size 1048576
2019-10-27 12:36:48.834614: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6914c00 next 30 of size 1048576
2019-10-27 12:36:48.834623: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6a14c00 next 29 of size 1048576
2019-10-27 12:36:48.834633: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6b14c00 next 32 of size 1048576
2019-10-27 12:36:48.834642: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f3cf6c14c00 next 31 of size 1048576
2019-10-27 12:36:48.834652: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free  at 0x7f3cf6d14c00 next 18446744073709551615 of size 4823364608
2019-10-27 12:36:48.834661: I tensorflow/core/common_runtime/bfc_allocator.cc:914]      Summary of in-use Chunks by size: 
2019-10-27 12:36:48.834673: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 15 Chunks of size 256 totalling 3.8KiB
2019-10-27 12:36:48.834684: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 3 Chunks of size 1024 totalling 3.0KiB
2019-10-27 12:36:48.834694: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 1280 totalling 1.2KiB
2019-10-27 12:36:48.834706: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 5 Chunks of size 8192 totalling 40.0KiB
2019-10-27 12:36:48.834715: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 9984 totalling 9.8KiB
2019-10-27 12:36:48.834726: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 27648 totalling 27.0KiB
2019-10-27 12:36:48.834736: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 516096 totalling 504.0KiB
2019-10-27 12:36:48.834747: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 12 Chunks of size 1048576 totalling 12.00MiB
2019-10-27 12:36:48.834759: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 12.57MiB
2019-10-27 12:36:48.834769: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 4837081088 memory_limit_: 4837081088 available bytes: 0 curr_region_allocation_bytes_: 9674162176
2019-10-27 12:36:48.834784: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: 
Limit:                  4837081088
InUse:                    13185792
MaxInUse:                 14756864
NumAllocs:                     186
MaxAllocSize:              1048576

You can see that my model is small, it doesn't need anything close to 8GB.

EDIT #2:

I just reverted back to TF2 beta (tensorflow-gpu==2.0.0-beta1), and the problem disappeared. Hope we can find a better solution than this.

Viktor Tóth
  • 423
  • 6
  • 12

1 Answers1

3

It's TensorFlow's default behavior, allocating more than it actually needs - though it may not exactly be the dataset that's being allocated, you only need the model and the immediate tensors/data in TF/Keras session, accomplished in TF2 via:

max_memory = 8000 # dedicated memory in MB; run 'dxdiag' to get exact figure
max_usage = 0.95 * max_memory # example for using up to 95%

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
          gpus[0], 
          [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=max_usage)])

Also see TensorFlow Docs on limiting GPU memory growth, and relevant Git.


UPDATE: TF2 eager seems to have a known memory management issue - as a workaround, disable it to work in Eager, which can run significantly faster - see details here:

tf.compat.v1.disable_eager_execution()
OverLordGoldDragon
  • 1
  • 9
  • 53
  • 101
  • Thanks for the answer! I tried it, it doesn't seem to work, check my edit for more info. – Viktor Tóth Oct 27 '19 at 16:46
  • @ViktorTóth What value of `max_usage` did you use? Have you tried lowering it? – OverLordGoldDragon Oct 27 '19 at 17:50
  • Yes I tried 0.9 first then I lowered it to 0.5, which is less than 8GB; got the same error – Viktor Tóth Oct 27 '19 at 18:15
  • I have two 2080 ti btw, running the training only on one of them. – Viktor Tóth Oct 27 '19 at 18:19
  • @ViktorTóth That's important info - then `len(gpus)==2`, and `[0]` may not be the one used in training; try `gpus[1]` in answer code snippet – OverLordGoldDragon Oct 27 '19 at 18:28
  • I added the code snippet for GPU selection that I use; I think I did that part right – Viktor Tóth Oct 27 '19 at 18:36
  • As soon as I reverted back to TF2 beta it runs fine (without the need for limiting memory) – Viktor Tóth Oct 27 '19 at 19:55
  • 1
    @ViktorTóth I'll look into it a bit later, but for now, make sure your CUDA & cuDNN versions are compatible with TensorFlow 2.0.0: [list](https://www.tensorflow.org/install/source_windows#gpu). For reference, I use CUDA 10.0.130, cuDNN 7.4.2 (latest stable versions). Also, try disabling TensorBoard and see if it helps - this isn't exactly a solution, but I observed absurd slowdowns for a large model when running TB, a likely bug, though I didn't get around to fixing it. – OverLordGoldDragon Oct 27 '19 at 20:05
  • 1
    @ViktorTóth Had to scrap and reinstall all TF-related modules, CUDA included - the process is quite tricky. To note, cuDNN 7.4.6 should work fine, and the compatibility table is outdated - try updating to that version, and run all installs via Anaconda (`tensorflow-gpu==2.0.0` support added 4 days ago). If fails, it'd help to see your full model code to reproduce the issue (both of out-of-memory and "small dataset" cases) – OverLordGoldDragon Oct 29 '19 at 00:17
  • @ViktorTóth Update: you seem to be right, and this seems to be a bug: OOM _is_ related to data size; same code (in my application) on TF1 runs just fine, even without any memory-limiting code. I'll investigate. (P.S. correction: cuDNN 7.6.0, not 7.4.6) – OverLordGoldDragon Oct 30 '19 at 12:50
  • 1
    @ViktorTóth Figured it out, at least for myself: try `tf.compat.v1.disable_eager_execution()` - worked for me even without any memory limiting code. If it works for you, I'll update the answer and add an explanation - [relevant SO](https://stackoverflow.com/questions/58441514/why-is-tensorflow-2-much-slower-than-tensorflow-1) – OverLordGoldDragon Oct 30 '19 at 13:48
  • 1
    Nice, yeah it works fine by disabling eager exec. Thanks a lot! – Viktor Tóth Nov 22 '19 at 22:45