1

I am using a GPU cluster where the submitted jobs are managed by Slurm. I don't have admin / root priviledges on that server. I am currently trying to build a project that contains .cpp and .cu files. I do that by calling TORCH_CUDA_ARCH_LIST=7.2 CC=gcc-7 CXX=g++-7 python setup.py install, as the cluster uses CUDA 10.1 and runs V100 GPUs (hence the gencode is sm_70).

However, the build crashes with the following error message:

building <filename> extension
gcc-7 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes (...): 
error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
  138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
      |  ^~~~~
error: command '/<somepath>/anaconda3/envs/pytorch14/bin/nvcc' failed with exit status 1

So, as one can see by the gcc-7 call in the 2nd line, the python script is using the right compiler, but unfortunately, the nvcc call uses the system-wide gcc symlink, which is: /usr/bin/gcc: symbolic link to gcc-9. I have found a couple of answers online (including this and this) and have tried the suggested steps. But: as I don't have root access, I cannot create a new symlink / change the existing symlink to another installed gcc version, e.g. /usr/bin/gcc-7: doing ln -s /usr/bin/gcc-7 /usr/bin/gcc gives me a ln: failed to create symbolic link '/usr/bin/gcc': File exists error, and copying the files into /usr/local/bin, as suggested in other answers on SO, wont work either because of the missing priviledges.

I'm really at a loss here and feel that this might be a dead end. Does anybody have any suggestions?

For reference, this is what my setup.py looks like:

from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension

setup(
    name='noise_cuda',
    ext_modules=[
        CUDAExtension('noise_cuda', [
            'noise_cuda.cpp',
            'noise_cuda_kernel.cu',
        ]),
    ],
    cmdclass={
        'build_ext': BuildExtension
    })
paleonix
  • 2,293
  • 1
  • 13
  • 29
masterBroesel
  • 189
  • 1
  • 12
  • 1
    When using `nvcc` dirctly it is no problem to specify the host compiler directly (See [here](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#file-and-path-specifications-compiler-bindir)). However I have no idea how to do this in this python context. – paleonix Jul 30 '21 at 08:02
  • Just change PATH to point first to a directory where you have write access (~/bin may already be there), and add the symlink in that directory? – Marc Glisse Jul 31 '21 at 17:50

1 Answers1

2

I'm not a pytorch user, but if I read the docs right, this should work:


import sysconfig
from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension

setup(
    name='noise_cuda',
    ext_modules=[
        CUDAExtension('noise_cuda', [
            'noise_cuda.cpp',
            'noise_cuda_kernel.cu',
        ], extra_compile_args={'cxx': sysconfig.get_config_var('CFLAGS').split(), 
                               'nvcc': ['-ccbin=/usr/bin/gcc-7']}),
    ],
    cmdclass={
        'build_ext': BuildExtension
    })


masterBroesel
  • 189
  • 1
  • 12
paleonix
  • 2,293
  • 1
  • 13
  • 29
  • 1
    Yes, indeed, that's what solved it! I'll update your answer slightly (to put the `extra_compile_args` in the right spot) and one has to specify the `cxx` attribute, or the compiler will complain about a `KeyError`. Thanks a lot! Really appreciate it :) – masterBroesel Jul 30 '21 at 09:03