Questions tagged [ptxas]

ptxas is an assembler for Parallel Thread eXecution.

ptxas is a NVIDIA-provided tool for assembling Parallel Thread eXecution (PTX) code, as part of the toolchain for targeting NVIDIA GPU computing processors via CUDA. It translates .ptx assembly files into .cubin files (cuda binary files).

25 questions
19
votes
1 answer

Interpreting the verbose output of ptxas, part I

I am trying to understand resource usage for each of my CUDA threads for a hand-written kernel. I compiled my kernel.cu file to a kernel.o file with nvcc -arch=sm_20 -ptxas-options=-v and I got the following output (passed through c++filt): ptxas…
curiousexplorer
  • 1,217
  • 1
  • 17
  • 24
9
votes
2 answers

How can I implement a custom atomic function involving several variables?

I'd like to implement this atomic function in CUDA: __device__ float lowest; // global var __device__ int lowIdx; // global var float realNum; // thread reg var int index; // thread reg var if(realNum < lowest) { lowest= realNum; //…
Doug
  • 2,783
  • 6
  • 33
  • 37
3
votes
1 answer

How to overcome Stack size warning?

I would like to know the best practice concerning the following type of warning: ptxas warning : Stack size for entry function '_Z11cuda_kernelv' cannot be statically determined It appears adding the virtual keyword to the destructor of Internal,…
unegare
  • 2,197
  • 1
  • 11
  • 25
3
votes
2 answers

NVCC register usage report in __device__ function

I'm trying to get some information about register usage in my CUDA kernels using NVCC option --ptxas-options=v and while with global functions everything is ok, I'm having some difficulties with the device ones since the ptxas info : Used N…
3
votes
1 answer

CUDA ptxas Error "function uses too much shared data"

I have never used CUDA or C++ before, but I am trying to get Ramses GPU from http://www.maisondelasimulation.fr/projects/RAMSES-GPU/html/download.html running. Due to an error in the autogen.sh I used ./configure and got this one working. So the…
Geru
  • 624
  • 1
  • 7
  • 20
2
votes
1 answer

Avoiding unnecessary mov operations in inline PTX

When writing PTX in a separate file, a kernel parameter can be loaded into a register with: .reg .u32 test; ld.param.u32 test, [test_param]; However, when using inline PTX, the Using Inline PTX Assembly in CUDA (version 01) application note…
Roger Dahl
  • 15,132
  • 8
  • 62
  • 82
2
votes
2 answers

Strange results for profiled executed instructions and issued instructions in Fermi GPU (GTX 580)

My kernel has the ptx version like this: .version 2.2 .target sm_20, texmode_independent .entry histogram( .param .u32 .ptr .global .align 4 histogram_param_0, .param .u32 .ptr .global .align 4 histogram_param_1 ) { .reg…
Zk1001
  • 2,033
  • 4
  • 19
  • 36
2
votes
1 answer

NVCC separate compilation with PTX output

Just to see what kind of code CUDA is generating I like to compile to ptx in addition to an object file. Since some of my loop unrolling can take quite a while I'd like to be able to compile *.cu→*.ptx→*.o instead of wasting time with both…
jozxyqk
  • 16,424
  • 12
  • 91
  • 180
2
votes
1 answer

Function properties for __internal_trig_reduction_slowpathd

At the moment I am trying to optimize some cuda kernels... If compile with the option --ptxas-options=-v I get the information about registers %co. In my case I always get some extra lines, which make no sense for me: ptxas : info : Compiling entry…
raspiede
  • 179
  • 1
  • 1
  • 11
1
vote
1 answer

Debugging inline PTX in Parallel Nsight

Is it possible to view the PTX registers when stepping through inline PTX in Parallel Nsight? I can set breakpoints on inline PTX and step through it, but hovering over the PTX registers does not show their values. I can turn on SASS and hovering on…
Roger Dahl
  • 15,132
  • 8
  • 62
  • 82
1
vote
1 answer

Setting 32 bit address size in inline PTX

I'm in the processing of converting PTX written as a separate file to inline PTX. In the separate PTX file, I was defining the ISA and target as follows: .version 1.2 .target sm_13 In the PTX file generated by the compiler, after having inlined the…
Roger Dahl
  • 15,132
  • 8
  • 62
  • 82
1
vote
1 answer

How should i ovverride the GCC runtime compiler options for specific GPU

Using the GCC compiler and OpenMP programming, I’m now working on a project that offloads data to an Nvidia GPU. I needed some help in figuring out an issue . The default setup uses a virtual GPU that emulates the Kepler architecture (compute…
Charu Jain
  • 36
  • 6
1
vote
1 answer

OpenCL including header causes ptxas fatal: Unresolved extern function

What's wrong I'm trying to include some C code in my .cl code to call from the kernel. As far as I can tell, the compiler can find the .h files fine, but fails to include the implementation .c file. I'm using C# with OpenCL.NetCore source code is…
TheINCGI
  • 35
  • 5
1
vote
2 answers

Interpreting the verbose output of ptxas, part II

This question is a continuation of Interpreting the verbose output of ptxas, part I . When we compile a kernel .ptx file with ptxas -v, or compile it from a .cu file with -ptxas-options=-v, we get a few lines of output such as: ptxas info :…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
1 answer

What is the correct way to support `__shfl()` and `__shfl_sync()` instructions?

From my understanding, CUDA 10.1 removed the shfl instructions: PTX ISA version 6.4 removes the following features: Support for shfl and vote instructions without the .sync qualifier has been removed for .targetsm_70 and higher. This support was…
Blizzard
  • 1,117
  • 2
  • 11
  • 28
1
2