RuntimeError: Dst tensor is not initialized in Tensorflow

Question

I got

RuntimeError: Dst tensor is not initialized

during the training of my neural network. Specifically, it seems that the error appears when in my custom Callback I get the predictions using self.model.predict(self.dataset) because the stack trace says

File "mlp_keras.py", line 20, in on_epoch_end predictions = self.model.predict(self.dataset)

This is the full stack trace:

Traceback (most recent call last):
  File "mlp_keras.py", line 150, in <module>
    callbacks=[KendallTauHistory(training_dataset, training_dataset_labels, groups_id_count)])
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 397, in fit
    prefix='val_')
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 771, in on_epoch
    self.callbacks.on_epoch_end(epoch, epoch_logs)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/callbacks.py", line 302, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "mlp_keras.py", line 20, in on_epoch_end
    predictions = self.model.predict(self.dataset)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1013, in predict
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 498, in predict
    workers=workers, use_multiprocessing=use_multiprocessing, **kwargs)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 426, in _model_iteration
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 706, in _process_inputs
    use_multiprocessing=use_multiprocessing)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 357, in __init__
    dataset = self.slice_inputs(indices_dataset, inputs)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 383, in slice_inputs
    dataset_ops.DatasetV2.from_tensors(inputs).repeat()
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 566, in from_tensors
    return TensorDataset(tensors)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2765, in __init__
    element = structure.normalize_element(element)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/util/structure.py", line 113, in normalize_element
    ops.convert_to_tensor(t, name="component_%d" % i))
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function
    return constant_op.constant(value, dtype, name=name)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant
    allow_broadcast=True)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 266, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
RuntimeError: Dst tensor is not initialized.

This is my code:

class KendallTauHistory(Callback):
      def __init__(self, dataset, y_true, groups):
        self.y_true = y_true
        self.dataset = dataset
        self.groups = groups

      def on_epoch_end(self, epoch, logs=None):
        predictions = self.model.predict(self.dataset)
        predictions = predictions.flatten()
        predictions = list(map(lambda element: element + np.random.uniform(0.0, 1.0) * 0.02 - 0.01, predictions))
        # For batch training
        ranked_predictions = np.array([])
        kendalls = np.array([])
        start_range = 0
        for group in self.groups:
            end_range = (start_range + group[1]) # Batch is a group of words with same group id
            batch_predictions = predictions[start_range:end_range]
            batch_labels = self.y_true[start_range:end_range]
            batch_predictions = list(map(lambda element: element + np.random.uniform(0.0, 1.0) * 0.02 - 0.01, batch_predictions))
            ranked_predictions = np.append(ranked_predictions, np.floor(rankdata(batch_predictions)))
            kendalls = np.append(kendalls, kendalltau(batch_labels, batch_predictions))
            start_range = end_range
        #self.y_true = self.y_true[0:len(ranked_predictions)]
        print('\nORIGINAL LABELS: {0}\n'.format(self.y_true))
        print('PREDICTED LABELS: {0}'.format(ranked_predictions))
        print("\nEpoch Kendall's tau: {0}".format(np.mean(kendalls)))


    model = tf.keras.Sequential()
    model.add(LSTM(units=10, batch_input_shape=(None, 2, 839)))
    model.add(Dense(15, activation='sigmoid'))

    model.summary()

    model.compile(loss=listnet_loss, optimizer=keras.optimizers.Nadam(learning_rate=0.000005, beta_1=0.9, beta_2=0.999))
    real_labels = np.array([])
    losses = np.array([])

    with tf.device('/GPU:0'):
      model.fit(training_dataset, training_dataset_labels, epochs=10, workers=10,
                verbose=1, callbacks=[KendallTauHistory(training_dataset, training_dataset_labels, groups_id_count)])

Is this the correct way of doing things? Shouldn't compilation of the model be done within with tf.device('/GPU:0') whereas fit doesn't have to be? Have you resolved the issue? — Krzysztof Maliszewski, Aug 22 '21 at 02:04

score 1 · Answer 1 · answered Feb 24 '20 at 14:59

1

Usually this error springs from the fact that your GPU remains out of memory while trying to allocate a tensor. In fact, I had the same error one week ago while training on a multi-gpu environment with CUDA 10.0 and TensorFlow 1.13.0.

My suggestion is to reduce the batch_size parameter in your .fit() method. If you did not explicitly define it, then it is set on 32. Gradually reduce by a factor of 2 in order to see your error gone.

You can also read about this error at this link: https://github.com/tensorflow/tensorflow/issues/7025

(Getting "Dst tensor is not initialized." when really the problem is out of GPU memory)

The error may be misleading, since it is not aligned with "OOM error" which is typical to TensorFlow when memory allocation problems appear.

answered Feb 24 '20 at 14:59

Timbus Calin

13,809
5
41
59

thank you, I notice that I post a not updated code: my current ```batch_size``` is 2. – pairon Feb 24 '20 at 15:01
Please also update to Python at least 3.6; also, can you please tell me what GPU you are using – Timbus Calin Feb 24 '20 at 15:02
Python3 might be a problem because I'm on a remote machine and I think pip3 is not installed. The GPU is a Tesla P100 – pairon Feb 24 '20 at 15:14
If you carefully read the stack-trace in the link that I provided this error only springs up due to memory issues. – Timbus Calin Feb 24 '20 at 15:48
In your self.model.predict(...., batch_size = 2). See if it works like this. – Timbus Calin Feb 24 '20 at 15:53
Must be related to that exact line if it does not break during the training. – Timbus Calin Feb 24 '20 at 15:56
Nothing changes unfortunately. – pairon Feb 24 '20 at 22:41

RuntimeError: Dst tensor is not initialized in Tensorflow

1 Answers1

Linked