Questions tagged [nvcc]

"nvcc" is NVIDIA's LLVM-based C/C++ compiler for targeting GPUs with CUDA.

This tag refers to NVIDIA’s compiler toolchain nvcc for the parallel computing architecture (CUDA). Documentation for nvcc is included with the CUDA Toolkit.

You should ask questions about CUDA here on Stack Overflow, but if you have bugs to report you should discuss them on the CUDA forums or report them via the registered developer portal. You may want to cross-link to any discussion here on Stack Overflow.

688 questions
58
votes
3 answers

How can I compile CUDA code then link it to a C++ project?

I am looking for help getting started with a project involving CUDA. My goal is to have a project that I can compile in the native g++ compiler but uses CUDA code. I understand that I have to compile my CUDA code in nvcc compiler, but from my…
Matthew
  • 3,886
  • 7
  • 47
  • 84
51
votes
1 answer

CUDA: How to use -arch and -code and SM vs COMPUTE

I am still not sure how to properly specify the architectures for code generation when building with nvcc. I am aware that there is machine code as well as PTX code embedded in my binary and that this can be controlled via the controller switches…
bweber
  • 3,772
  • 3
  • 32
  • 57
43
votes
2 answers

What is the purpose of using multiple "arch" flags in Nvidia's NVCC compiler?

I've recently gotten my head around how NVCC compiles CUDA device code for different compute architectures. From my understanding, when using NVCC's -gencode option, "arch" is the minimum compute architecture required by the programmer's…
James Paul Turner
  • 791
  • 3
  • 8
  • 23
35
votes
6 answers

Nvcc missing when installing cudatoolkit?

I have installed cuda along pytorch with conda install pytorch torchvision cudatoolkit=10.0 -c pytorch However, it seems like nvcc was not installed along with it. If I want to use for example nvcc -V, I get the error that nvcc was not found, and…
Luca Thiede
  • 3,229
  • 4
  • 21
  • 32
29
votes
4 answers

Simplest Possible Example to Show GPU Outperform CPU Using CUDA

I am looking for the most concise amount of code possible that can be coded both for a CPU (using g++) and a GPU (using nvcc) for which the GPU consistently outperforms the CPU. Any type of algorithm is acceptable. To clarify: I'm literally looking…
Chris Redford
  • 16,982
  • 21
  • 89
  • 109
27
votes
3 answers

How can I prevent C++ guessing a second template argument?

I'm using a C++ library (strf) which, somewhere within it, has the following code: namespace strf { template inline auto range(ForwardIt begin, ForwardIt end) { /* ... */ } template inline auto…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
23
votes
2 answers

How can I get the nvcc CUDA compiler to optimize more?

When using a C or C++ compiler, if we pass the -O3 switch, execution becomes faster. In CUDA, is there something equivalent? I am compiling my code using the command nvcc filename.cu. After that I execute ./a.out.
user12290
  • 621
  • 1
  • 5
  • 13
22
votes
4 answers

Linking error: DSO missing from command line

I am rather new to Linux (using Ubuntu 14.04 LTS 64bit), coming from Windows, and am attempting to port over an existing CUDA project of mine. When linking via /usr/local/cuda/bin/nvcc -arch=compute_30 -code=sm_30,compute_30 -o Main.o Display.o…
Will Bolden
  • 810
  • 1
  • 7
  • 13
21
votes
7 answers

How to disable a specific nvcc compiler warnings

I want to disable a specific compiler warning with nvcc, specifically warning: NULL reference is not allowed The code I am working on uses NULL references are part of SFINAE, so they can't be avoided. An ideal solution would be a #pragma in just…
bcumming
  • 1,075
  • 2
  • 8
  • 17
21
votes
5 answers

error: cuda_runtime.h: No such file or directory

How can I force gcc to look in /usr/cuda/local/include for cuda_runtime.h? I'm attempting to compile a CUDA application with a C wrapper. I'm running Ubuntu 10.04. I've successfully compiled my CUDA application into a .so with the following…
skrieder
  • 379
  • 1
  • 2
  • 8
18
votes
1 answer

Can you `= delete` a templated function on a second declaration?

Consider the following code: template int foo(); template int foo() = delete; is this valid C++11? GCC (9.1) says: Yes! clang (8.0) says: No! nvcc (9.2) says: No! MSVC (19.20) says: Yes! (in C++14 mode, it doesn't support…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
18
votes
3 answers

Problems when running nvcc from command line

I need to compile a cuda .cu file using nvcc from command line. The file is "vectorAdd_kernel.cu" and contains the following piece of code: extern "C" __global__ void VecAdd_kernel(const float* A, const float* B, float* C, int N) { int i =…
user1738687
  • 457
  • 3
  • 12
16
votes
2 answers

Unsupported gpu architecture compute_30 on a CUDA 5 capable gpu

I'm currently trying to compile Darknet on the latest CUDA toolkit which is version 11.1. I have a GPU capable of running CUDA version 5 which is a GeForce 940M. However, while rebuilding darknet using the latest CUDA toolkit, it said nvcc fatal …
3MP The Rook
  • 173
  • 1
  • 2
  • 5
16
votes
1 answer

Limiting register usage in CUDA: __launch_bounds__ vs maxrregcount

From the NVIDIA CUDA C Programming Guide: Register usage can be controlled using the maxrregcount compiler option or launch bounds as described in Launch Bounds. From my understanding (and correct me if I'm wrong), while -maxrregcount limits…
Kelsius
  • 433
  • 2
  • 5
  • 19
16
votes
2 answers

Can I use C++11 in the .cu-files (CUDA5.5) in Windows7x64 (MSVC) and Linux64 (GCC4.8.2)?

When I compile the following code containing the design C++11, in Windows7x64 (MSVS2012 + Nsight 2.0 + CUDA5.5), then I do not get errors, and everything compiles and works well: #include int main() { …
Alex
  • 12,578
  • 15
  • 99
  • 195
1
2 3
45 46