I am trying to train my model using tensorflow.keras, but it is failing after some number of epochs due to OOM. Tensorflow 2.0 has marked many things as deprecated, and I can't tell how I am supposed to diagnose the problem.
The network is a series of Conv1D layers and a few self-attention layers converting from one sequence to another. The sequences are variable length, but there is no correlation between sequence length and when it fails. IE: it may process a 6 minute sequence fine, but fail on a 4 minute one.
with tensorflow.device('/device:gpu:0'):
m2t = BuildGenerator() #builds and returns model
m2t.compile(optimizer='adam', loss='mse')
for epoch in range(1):
for inout in InputGenerator(params):
m2t.train_on_batch(inout[0], inout[1])
Things I have tried:
- Removing the self-attention layers. It still fails
- Removing all but a small number of layers. It still fails
- Padding all sequences to a constant length. It still fails
- Using m2t.predict(inout[0]) instead of train_on_batch. It fails, but it takes longer.
- Use tensorflow.summary.trace_export. It records something, but it doesn't load in chrome, like the page HERE suggests.
- I looked at THIS answer, but with the changes in TF-2.0, I'm not sure how to do that properly.
There are no other calls into tensorflow or keras.
EDIT: As requested, sample error logs. It is a slightly different error every time.
A few of these, with a few successful runs in-between.
W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Then it starts with this, and a giant list of "# chunks of size ..." and "InUse..."
W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 43.26MiB (rounded to 45360128). Current allocation summary follows.
I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): Total Chunks: 79, Chunks in use: 79. 19.8KiB allocated for chunks. 19.8KiB in use in bin. 2.2KiB client-requested in use in bin.
...
I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 8.40GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 9109728768 memory_limit_: 9109728789 available bytes: 21 curr_region_allocation_bytes_: 17179869184
I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit: 9109728789
InUse: 9024084224
MaxInUse: 9024084224
NumAllocs: 38387
MaxAllocSize: 1452673536
W tensorflow/core/common_runtime/bfc_allocator.cc:424]
W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at cwise_ops_common.cc:82 : Resource exhausted: OOM when allocating tensor with shape[1,45000,12,21] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File ".\TrainGNet.py", line 380, in <module>
m2t.train_on_batch(inout[0], inout[1])
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 973, in train_on_batch
class_weight=class_weight, reset_metrics=reset_metrics)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 264, in train_on_batch
output_loss_metrics=model._output_loss_metrics)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 311, in train_on_batch
output_loss_metrics=output_loss_metrics))
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 268, in _process_single_batch
grads = tape.gradient(scaled_total_loss, trainable_weights)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\eager\backprop.py", line 1014, in gradient
unconnected_gradients=unconnected_gradients)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\eager\imperative_grad.py", line 76, in imperative_grad
compat.as_str(unconnected_gradients.value))
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\eager\backprop.py", line 138, in _gradient_function
return grad_fn(mock_op, *out_grads)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\ops\math_grad.py", line 251, in _MeanGrad
return math_ops.truediv(sum_grad, math_ops.cast(factor, sum_grad.dtype)), None
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\util\dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 1066, in truediv
return _truediv_python3(x, y, name)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 1005, in _truediv_python3
return gen_math_ops.real_div(x, y, name=name)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py", line 7950, in real_div
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,45000,12,21] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RealDiv] name: truediv/
EDIT2 and 3: Here is a minimal example. This fails after printing '11' for me. Edit3 reduced the size significantly.
from tensorflow.keras.models import Model
from tensorflow.keras.layers import *
import tensorflow.keras.backend as K
import numpy as np
import tensorflow
def BuildGenerator():
i = Input(shape=(None,2,))
n_input = 12*21
to_n = Input(shape=(n_input))
s_n = Dense(12*21, activation='softmax')(to_n)
s_n = Reshape((12,21))(s_n)
n_base = Model(inputs=[to_n], outputs=[s_n])
b = Conv1D(n_input, 11, dilation_rate=1, padding='same', activation='relu', data_format='channels_last')(i)
n = TimeDistributed(n_base)(b)
return Model(inputs=[i], outputs=[n])
def InputGenerator():
for iter in range(1000):
print(iter)
i = np.zeros((1,10*60*1000,2))
n = np.zeros((1,10*60*1000,12,21))
yield ([i], [n])
with tensorflow.device('/device:gpu:0'):
m2t = BuildGenerator()
m2t.compile(optimizer='adam', loss='mse')
for epoch in range(1):
for inout in InputGenerator():
m2t.train_on_batch(inout[0], inout[1])