-2

I'm trying to run a TensorFlow model, for the first time, on a NVIDIA Titan RTX, but I'm getting some errors.

CUDA version

$ cat /usr/local/cuda/version.json
{
   "cuda" : {
      "name" : "CUDA SDK",
      "version" : "11.3.20210326"
   },
...

python3.9.1 and tensorflow2.5.0-rc1

Traceback (most recent call last):
  File "/home/marcus/COVID-19-forecasting/COVID-19/run_experiments.py", line 23, in <module>
    exp.run_experiments(dat.horizon, dat.pad_val, dat.padded_scaled_train, dat.multi_out_scaled_val, dat.padded_scaled_test_x,
  File "/home/marcus/COVID-19-forecasting/COVID-19/experiment.py", line 110, in run_experiments
    lstm_hist = lstm.fit([tr, enc_names], [v[0], v[1], v[2]], self.epochs, verbose=0)
  File "/home/marcus/COVID-19-forecasting/COVID-19/model.py", line 55, in fit
    return self.model.fit(x=x, y=y, epochs=epochs, callbacks=callbacks, verbose=verbose)
  File "/home/marcus/COVID-19-forecasting/covid-venv/lib/python3.9/site-packages/tensorflow/python/keras/engine/training.py", line 1183, in fit
    tmp_logs = self.train_function(iterator)
  File "/home/marcus/COVID-19-forecasting/covid-venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/home/marcus/COVID-19-forecasting/covid-venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/marcus/COVID-19-forecasting/covid-venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3023, in __call__
    return graph_function._call_flat(
  File "/home/marcus/COVID-19-forecasting/covid-venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/marcus/COVID-19-forecasting/covid-venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 591, in call
    outputs = execute.execute(
  File "/home/marcus/COVID-19-forecasting/covid-venv/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError:    Fail to find the dnn implementation.
         [[{{node cond_40/then/_0/cond/CudnnRNNV3}}]]
         [[multi_output_rnn/encoder_block/rnn_encoder/PartitionedCall]] [Op:__inference_train_function_6309]

Function call stack:
train_function -> train_function -> train_function

I tried adding these lines to my code but nothing changed.

physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)
talonmies
  • 70,661
  • 34
  • 192
  • 269
Marcus
  • 289
  • 1
  • 3
  • 19

1 Answers1

0

I'm not sure if this is a bug or a problem with the computer I'm using, but python3.9 uses TensorFlow2.5 and these versions do not seem to work on the GRU.

My solution was to install python3.8, then, inside a venv, I installed TensorFlow2.4 and my script worked.

Marcus
  • 289
  • 1
  • 3
  • 19