TensorFlow: Ran out of memory trying to allocate 16.0KiB

Question

Running a training script results in a memory error on a GPU-optimized Ubuntu machine. The error looks suspicious, as the machine has enough memory to run the algorithm.

Here is the error:

TensorFlow: Ran out of memory trying to allocate 16.0KiB

Memory status:

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          15038         190        6580           8        8267       14670
Swap:             0           0           0

Console with error:

$ python ./train.py --run --continue
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Loading data..
Number of categories: 2
Number of samples 425
/home/ubuntu/DeepClassificationBot-master/data.py:134: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  val = np.random.choice(dataset_indx, size=number_of_samples)
/home/ubuntu/DeepClassificationBot-master/data.py:127: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  train = np.random.choice(dataset_indx, size=number_of_samples)
Building and Compiling model..
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
Training..
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (256):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (512):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (1024):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (2048):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (4096):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (8192):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (16384):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (32768):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (65536):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (131072):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (262144):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (524288):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (1048576):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (2097152):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (4194304):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (8388608):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (16777216):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (33554432):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (67108864):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (134217728):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (268435456):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:656] Bin for 16.0KiB was 16.0KiB, Chunk State: 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702580000 of size 6912
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702581b00 of size 6912
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702583600 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702583700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702583800 of size 147456
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7025a7800 of size 147456
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7025cb800 of size 256
....... Very long list of chunks
I tensorflow/core/common_runtime/bfc_allocator.cc:689]      Summary of in-use Chunks by size: 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 115 Chunks of size 256 totalling 28.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 34 Chunks of size 512 totalling 17.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 21 Chunks of size 1024 totalling 21.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 42 Chunks of size 2048 totalling 84.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7 Chunks of size 6912 totalling 47.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 42 Chunks of size 16384 totalling 672.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 32768 totalling 160.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7 Chunks of size 147456 totalling 1008.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7 Chunks of size 294912 totalling 1.97MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7 Chunks of size 589824 totalling 3.94MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7 Chunks of size 1179648 totalling 7.88MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 14 Chunks of size 2359296 totalling 31.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7 Chunks of size 4718592 totalling 31.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 35 Chunks of size 9437184 totalling 315.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 67108864 totalling 320.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 6 Chunks of size 411041792 totalling 2.30GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 663988224 totalling 633.23MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 3.61GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats: 
Limit:                  3878682624
InUse:                  3878682624
MaxInUse:               3878682624
NumAllocs:                     362
MaxAllocSize:            663988224

W tensorflow/core/common_runtime/bfc_allocator.cc:270] **********************************************************************************************xxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 16.0KiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:930] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized.
     [[Node: Variable_91/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [4096] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
  File "./train.py", line 154, in <module>
    run(extract=extract_mode, cont=continue_)
  File "./train.py", line 104, in run
    sample_weight=None)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 405, in fit
    sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1031, in fit
    self._make_train_function()
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 658, in _make_train_function
    training_updates = self.optimizer.get_updates(trainable_weights, self.constraints, self.total_loss)
  File "/usr/local/lib/python2.7/dist-packages/keras/optimizers.py", line 314, in get_updates
    vs = [K.variable(np.zeros(K.get_value(p).shape)) for p in params]
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 78, in variable
    get_session().run(v.initializer)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 710, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 908, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 958, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 978, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InternalError: Dst tensor is not initialized.
     [[Node: Variable_91/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [4096] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'Variable_91/initial_value', defined at:
  File "./train.py", line 154, in <module>
    run(extract=extract_mode, cont=continue_)
  File "./train.py", line 104, in run
    sample_weight=None)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 405, in fit
    sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1031, in fit
    self._make_train_function()
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 658, in _make_train_function
    training_updates = self.optimizer.get_updates(trainable_weights, self.constraints, self.total_loss)
  File "/usr/local/lib/python2.7/dist-packages/keras/optimizers.py", line 314, in get_updates
    vs = [K.variable(np.zeros(K.get_value(p).shape)) for p in params]
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 75, in variable
    v = tf.Variable(np.asarray(value, dtype=dtype), name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 211, in __init__
    dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 289, in _init_from_args
    dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 628, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 180, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 167, in constant
    attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2317, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1239, in __init__
    self._traceback = _extract_stack()

This works, but still issues memory warning:

$ python deploy.py --URL http://www.example.com/image.jpg
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
1/1 [==============================] - 2s
______________________________________________
Image Name: image.jpg
Categories: 
1. shrek 100.00%
2. darth vader 0.00%

I think it's likely that the out of memory error is associated with GPU memory. The K520 GPU you are running on has 4GB of memory and there is various overhead. You seem to have 3.6GB allocated. So it seems you are probably out of (GPU) memory. — Robert Crovella, Nov 28 '16 at 23:14
Yes, that would make sense. Still I am not sure hot I got there to using 3.6GB GPU memory. The training algorithm was able to run on machine with 1.5GB GPU memory. I would like to know where the extra memory is going and how to limit it to avoid this error. — Peter G., Nov 28 '16 at 23:22
The easiest way I know is to edit IsEnabled in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/log_memory.cc to always return true and rerun. This will print each tensor allocation with a name and op responsible. — Alexandre Passos, Nov 29 '16 at 00:09
If you don't want to recompile, you can run with RunOptions.FULL_TRACE as described here -- https://github.com/tensorflow/tensorflow/issues/1824#issuecomment-225754659 . May need to run with smaller model/batch-size to have the step complete, but at least you'll see how memory is allocated — Yaroslav Bulatov, Nov 29 '16 at 02:13

TensorFlow: Ran out of memory trying to allocate 16.0KiB

0 Answers0