0

When attempting to train a model using the test Movielens dataset and the instructions here. I get the following errors:

 ./mltrain.sh local ../data u.data

    Mon Jul  6 08:53:06 UTC 2020
    WARNING:tensorflow:From /home/XXXXXXXX/tensorflow-recommendation-wals/wals_ml_engine/trainer/task.py:176: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
    WARNING:tensorflow:From /home/XXXXXXXX/tensorflow-recommendation-wals/wals_ml_engine/trainer/task.py:176: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
    WARNING:tensorflow:From trainer/model.py:267: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
    INFO:tensorflow:Train Start: 2020-07-06 08:53:14
    trainer/wals.py:94: RuntimeWarning: divide by zero encountered in divide
      frac = np.array(1.0/(data > 0.0).sum(axis))
    WARNING:tensorflow:From trainer/wals.py:57: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
    2020-07-06 08:53:14.592099: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
    2020-07-06 08:53:14.592138: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
    2020-07-06 08:53:14.592170: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (sn-ga-recommender-model): /proc/driver/nvidia/version does not exist
    2020-07-06 08:53:14.592473: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
    2020-07-06 08:53:14.599363: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000160000 Hz
    2020-07-06 08:53:14.599627: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564d65170fd0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-07-06 08:53:14.599656: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    INFO:tensorflow:Train Finish: 2020-07-06 08:53:15
    INFO:tensorflow:train RMSE = 0.89
    INFO:tensorflow:test RMSE = 1.07
    Mon Jul  6 08:53:15 UTC 2020
Matt Evans
  • 7,113
  • 7
  • 32
  • 64

0 Answers0