1

I want to test a github for my work:

https://github.com/tufts-ml/GAN-Ensemble-for-Anomaly-Detection

so I did a

git clone https://github.com/tufts-ml/GAN-Ensemble-for-Anomaly-Detection

Unfortunately, I have an error when I do the command

sh experiments/run_mnist_en_fanogan.sh

(from the github README)

sh experiments/run_mnist_en_fanogan.sh                                                                                                                     1 ✘ 

/home/svetlana/.local/lib/python3.9/site-packages/torch/cuda/__init__.py:106: UserWarning: 
NVIDIA GeForce RTX 3080 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
/home/svetlana/.local/lib/python3.9/site-packages/torchvision/datasets/mnist.py:498:      UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:180.)
  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
Traceback (most recent call last):
  File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/train.py", line 30, in <module>
    main()
  File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/train.py", line 24, in main
    model.train()
  File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/f_anogan.py", line 155, in train
    self.gan_training(epoch)
  File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/f_anogan.py", line 93, in gan_training
    fake_imgs = self.net_Gds[i_G](z)
  File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/networks.py", line 175, in forward
    output = self.main(input)
  File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 916, in forward
    return F.conv_transpose2d(
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

I thought my installation is ok but now I have doubts. This is my installation:

Python 3.9.6 (default, Jun 30 2021, 10:22:16)

 nvcc  --version                                                                                                                                           

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jul_14_19:41:19_PDT_2021
Cuda compilation tools, release 11.4, V11.4.100
Build cuda_11.4.r11.4/compiler.30188945_0


import torch
print(torch.__version__)
1.9.0+cu102

I installed cudnn-11.4 from nvidia website (https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html),I don't know the command to check the version, I tried this one:

cat /opt/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

but it returns nothing

I tried solutions found here: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize,

without succes (to show VRAM, I used nvtop)

Picot
  • 75
  • 2
  • 10

1 Answers1

2

@Berriel

You right, I was focus on the error.

To solve my problem, I did

pip uninstall torch torchvision torchaudio

Then

pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

According to

https://pytorch.org/get-started/locally/

(this link is from the warning message)

Picot
  • 75
  • 2
  • 10