Tensorflow, using T2T, is unable to allocate GPU memory even with tons of available memory

Question

I am attempting to train a custom model in magenta, which piggybacks off tensorflow-gpu. The issue is that no matter what, tensorflow is unable to properly allocate my GPU memory and start training. For the record, here is the command I am using:

t2t_trainer --data_dir="{folder}" --hparams="label_smoothing=0.0, max_length=0,max_target_seq_length=4096" --hparams_set=score2perf_transformer_base --model=transformer --output_dir="{folder}" --problem=score2perf_maestro_language_uncropped_aug --train_steps=2500

This works without issue when the seq_length is set to 2048, only using about 25% CPU and GPU power. I have a i7-9600k and a RTX 2070, with 8 GB of VRAM. When I increase it 4096, however, it starts failing at even the smallest amount of GPU allocation. Here is a (condensed) version of the logs:

2019-11-14 14:38:14.028064: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 7.60G (8160437760 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-14 14:38:14.028311: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 6.84G (7344393728 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
WARNING:tensorflow:From c:\python\lib\site-packages\tensorflow_core\python\training\saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
W1114 14:38:14.551839  9104 deprecation.py:323] From c:\python\lib\site-packages\tensorflow_core\python\training\saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
I1114 14:38:14.811158  9104 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1114 14:38:14.944813  9104 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\conspiracy2\Documents\music comp\out\11-14 set 2.0\checkpts\model.ckpt.
I1114 14:38:17.920329  9104 basic_session_run_hooks.py:606] Saving checkpoints for 0 into C:\Users\conspiracy2\Documents\music comp\out\11-14 set 2.0\checkpts\model.ckpt.
2019-11-14 14:38:21.598678: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-11-14 14:38:22.574418: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 1.44G (1550483456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-14 14:38:22.574642: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 1.44G (1550483456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-14 14:38:32.575117: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 1.44G (1550483456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-14 14:38:32.575322: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 1.44G (1550483456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-14 14:38:32.575478: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 576.00MiB (rounded to 603979776).  Current allocation summary follows.
2019-11-14 14:38:32.575683: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256):   Total Chunks: 75, Chunks in use: 69. 18.8KiB allocated for chunks. 17.3KiB in use in bin. 304B client-requested in use in bin.
2019-11-14 14:38:32.575871: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512):   Total Chunks: 1, Chunks in use: 0. 512B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-14 14:38:32.576033: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024):  Total Chunks: 2, Chunks in use: 2. 2.3KiB allocated for chunks. 2.3KiB in use in bin. 2.0KiB client-requested in use in bin.
2019-11-14 14:38:32.576206: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048):  Total Chunks: 96, Chunks in use: 96. 192.0KiB allocated for chunks. 192.0KiB in use in bin. 192.0KiB client-requested in use in bin.
2019-11-14 14:38:32.576406: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-14 14:38:32.576604: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192):  Total Chunks: 19, Chunks in use: 18. 158.8KiB allocated for chunks. 144.0KiB in use in bin. 144.0KiB client-requested in use in bin.
2019-11-14 14:38:32.576926: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384):         Total Chunks: 13, Chunks in use: 12. 208.0KiB allocated for chunks. 192.0KiB in use in bin. 192.0KiB client-requested in use in bin.
2019-11-14 14:38:32.577128: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768):         Total Chunks: 48, Chunks in use: 48. 1.82MiB allocated for chunks. 1.82MiB in use in bin. 1.82MiB client-requested in use in bin.
2019-11-14 14:38:32.577355: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-14 14:38:32.577566: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-14 14:38:32.577770: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-14 14:38:32.577973: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288):        Total Chunks: 1, Chunks in use: 1. 620.0KiB allocated for chunks. 620.0KiB in use in bin. 620.0KiB client-requested in use in bin.
2019-11-14 14:38:32.578238: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576):       Total Chunks: 72, Chunks in use: 72. 72.00MiB allocated for chunks. 72.00MiB in use in bin. 72.00MiB client-requested in use in bin.
2019-11-14 14:38:32.578395: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-14 14:38:32.578561: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304):       Total Chunks: 37, Chunks in use: 36. 151.84MiB allocated for chunks. 144.00MiB in use in bin. 144.00MiB client-requested in use in bin.
2019-11-14 14:38:32.578834: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608):       Total Chunks: 38, Chunks in use: 37. 304.00MiB allocated for chunks. 296.00MiB in use in bin. 296.00MiB client-requested in use in bin.
2019-11-14 14:38:32.579017: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-14 14:38:32.579203: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432):      Total Chunks: 6, Chunks in use: 6. 192.00MiB allocated for chunks. 192.00MiB in use in bin. 192.00MiB client-requested in use in bin.
2019-11-14 14:38:32.579489: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864):      Total Chunks: 2, Chunks in use: 1. 160.00MiB allocated for chunks. 64.00MiB in use in bin. 64.00MiB client-requested in use in bin.
2019-11-14 14:38:32.579704: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-14 14:38:32.579998: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456):     Total Chunks: 11, Chunks in use: 10. 5.29GiB allocated for chunks. 5.00GiB in use in bin. 5.00GiB client-requested in use in bin.
2019-11-14 14:38:32.580279: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 576.00MiB was 256.00MiB, Chunk State:
2019-11-14 14:38:32.580407: I tensorflow/core/common_runtime/bfc_allocator.cc:891]   Size: 300.92MiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev:   Size: 512.00MiB | Requested Size: 512.00MiB | 
2019-11-14 14:38:32.643932: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 5.75GiB
2019-11-14 14:38:32.644132: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 6609954304 memory_limit_: 8160437862 available bytes: 1550483558 curr_region_allocation_bytes_: 16320876032
2019-11-14 14:38:32.644377: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit:                  8160437862
InUse:                  6177115648
MaxInUse:               6185504256
NumAllocs:                     611
MaxAllocSize:            603979776

2019-11-14 14:38:32.644686: W tensorflow/core/common_runtime/bfc_allocator.cc:424] *************************************_**********************************************************____
2019-11-14 14:38:32.644868: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at pad_op.cc:122 : Resource exhausted: OOM when allocating tensor with shape[16777216,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "c:\python\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
    return fn(*args)
  File "c:\python\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "c:\python\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[16777216,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node transformer/parallel_0_4/transformer/transformer/body/decoder/layer_2/self_attention/multihead_attention/dot_product_attention/Pad}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

I have attached a pastebin to the "full" relevant logs here: https://pastebin.com/CQpYdUC4

To get the obvious questions out of the way, no, I'm not running any other programs using the GPU, and no, I'm not running multiple instances. It fails to allocate even 512 MB of GPU usage, even though there should be up to ~8 GB available.

I have tried manually reducing the memory_fraction to as low as 0.2 in the t2t_trainer.py script, and also tried setting "allow_growth." Neither of these seem to help, although setting the memory_fraction to 0.2 did lower the available memory, and only started trying to allocate 1.44 GB instead of 7 at first.

I am at my wits end. For record, this is Tensorflow 1.14 and CUDA 10.0, because it's required by the model.

If you run `nvidia-smi` what is the memory usage before you run the script? E.g. is there anything else using the GPU memory? — geometrikal, Nov 14 '19 at 23:55
There is only the ~400 MB or so in use by the system, sometimes less. — conspiracy, Nov 15 '19 at 19:43

Tensorflow, using T2T, is unable to allocate GPU memory even with tons of available memory

0 Answers0