1

I got

RuntimeError: Dst tensor is not initialized

during the training of my neural network. Specifically, it seems that the error appears when in my custom Callback I get the predictions using self.model.predict(self.dataset) because the stack trace says

File "mlp_keras.py", line 20, in on_epoch_end predictions = self.model.predict(self.dataset)

This is the full stack trace:

Traceback (most recent call last):
  File "mlp_keras.py", line 150, in <module>
    callbacks=[KendallTauHistory(training_dataset, training_dataset_labels, groups_id_count)])
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 397, in fit
    prefix='val_')
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 771, in on_epoch
    self.callbacks.on_epoch_end(epoch, epoch_logs)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/callbacks.py", line 302, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "mlp_keras.py", line 20, in on_epoch_end
    predictions = self.model.predict(self.dataset)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1013, in predict
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 498, in predict
    workers=workers, use_multiprocessing=use_multiprocessing, **kwargs)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 426, in _model_iteration
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 706, in _process_inputs
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 357, in __init__
    dataset = self.slice_inputs(indices_dataset, inputs)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 383, in slice_inputs
    dataset_ops.DatasetV2.from_tensors(inputs).repeat()
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 566, in from_tensors
    return TensorDataset(tensors)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2765, in __init__
    element = structure.normalize_element(element)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/util/structure.py", line 113, in normalize_element
    ops.convert_to_tensor(t, name="component_%d" % i))
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function
    return constant_op.constant(value, dtype, name=name)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant
    allow_broadcast=True)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 266, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
RuntimeError: Dst tensor is not initialized.

This is my code:

class KendallTauHistory(Callback):
      def __init__(self, dataset, y_true, groups):
        self.y_true = y_true
        self.dataset = dataset
        self.groups = groups

      def on_epoch_end(self, epoch, logs=None):
        predictions = self.model.predict(self.dataset)
        predictions = predictions.flatten()
        predictions = list(map(lambda element: element + np.random.uniform(0.0, 1.0) * 0.02 - 0.01, predictions))
        # For batch training
        ranked_predictions = np.array([])
        kendalls = np.array([])
        start_range = 0
        for group in self.groups:
            end_range = (start_range + group[1]) # Batch is a group of words with same group id
            batch_predictions = predictions[start_range:end_range]
            batch_labels = self.y_true[start_range:end_range]
            batch_predictions = list(map(lambda element: element + np.random.uniform(0.0, 1.0) * 0.02 - 0.01, batch_predictions))
            ranked_predictions = np.append(ranked_predictions, np.floor(rankdata(batch_predictions)))
            kendalls = np.append(kendalls, kendalltau(batch_labels, batch_predictions))
            start_range = end_range
        #self.y_true = self.y_true[0:len(ranked_predictions)]
        print('\nORIGINAL LABELS: {0}\n'.format(self.y_true))
        print('PREDICTED LABELS: {0}'.format(ranked_predictions))
        print("\nEpoch Kendall's tau: {0}".format(np.mean(kendalls)))


    model = tf.keras.Sequential()
    model.add(LSTM(units=10, batch_input_shape=(None, 2, 839)))
    model.add(Dense(15, activation='sigmoid'))

    model.summary()

    model.compile(loss=listnet_loss, optimizer=keras.optimizers.Nadam(learning_rate=0.000005, beta_1=0.9, beta_2=0.999))
    real_labels = np.array([])
    losses = np.array([])

    with tf.device('/GPU:0'):
      model.fit(training_dataset, training_dataset_labels, epochs=10, workers=10,
                verbose=1, callbacks=[KendallTauHistory(training_dataset, training_dataset_labels, groups_id_count)])
pairon
  • 427
  • 1
  • 7
  • 18
  • Is this the correct way of doing things? Shouldn't compilation of the model be done within with tf.device('/GPU:0') whereas fit doesn't have to be? Have you resolved the issue? – Krzysztof Maliszewski Aug 22 '21 at 02:04

1 Answers1

1

Usually this error springs from the fact that your GPU remains out of memory while trying to allocate a tensor. In fact, I had the same error one week ago while training on a multi-gpu environment with CUDA 10.0 and TensorFlow 1.13.0.

My suggestion is to reduce the batch_size parameter in your .fit() method. If you did not explicitly define it, then it is set on 32. Gradually reduce by a factor of 2 in order to see your error gone.

You can also read about this error at this link: https://github.com/tensorflow/tensorflow/issues/7025

(Getting "Dst tensor is not initialized." when really the problem is out of GPU memory)

The error may be misleading, since it is not aligned with "OOM error" which is typical to TensorFlow when memory allocation problems appear.

Timbus Calin
  • 13,809
  • 5
  • 41
  • 59