Highest Voted 'ptxas' Questions

19

votes

1 answer

Interpreting the verbose output of ptxas, part I

I am trying to understand resource usage for each of my CUDA threads for a hand-written kernel. I compiled my kernel.cu file to a kernel.o file with nvcc -arch=sm_20 -ptxas-options=-v and I got the following output (passed through c++filt): ptxas…

asked Sep 12 '12 at 12:29

curiousexplorer

1,217
1
17
24

9

votes

2 answers

How can I implement a custom atomic function involving several variables?

I'd like to implement this atomic function in CUDA: __device__ float lowest; // global var __device__ int lowIdx; // global var float realNum; // thread reg var int index; // thread reg var if(realNum < lowest) { lowest= realNum; //…

cuda atomic gpu-atomics ptxas

asked Jul 01 '13 at 18:46

Doug

2,783
6
33
37

3

votes

1 answer

How to overcome Stack size warning?

I would like to know the best practice concerning the following type of warning: ptxas warning : Stack size for entry function '_Z11cuda_kernelv' cannot be statically determined It appears adding the virtual keyword to the destructor of Internal,…

c++ cuda stack ptxas

asked Mar 28 '22 at 20:21

unegare

2,197
1
11
25

3

votes

2 answers

NVCC register usage report in device function

I'm trying to get some information about register usage in my CUDA kernels using NVCC option --ptxas-options=v and while with global functions everything is ok, I'm having some difficulties with the device ones since the ptxas info : Used N…

cuda gpu nvcc ptxas

asked Aug 30 '17 at 09:45

Christopher23

126
8

3

votes

1 answer

CUDA ptxas Error "function uses too much shared data"

I have never used CUDA or C++ before, but I am trying to get Ramses GPU from http://www.maisondelasimulation.fr/projects/RAMSES-GPU/html/download.html running. Due to an error in the autogen.sh I used ./configure and got this one working. So the…

c++ cuda gpu-shared-memory ptxas

asked May 14 '14 at 07:45

Geru

624
1
7
20

2

votes

1 answer

Avoiding unnecessary mov operations in inline PTX

When writing PTX in a separate file, a kernel parameter can be loaded into a register with: .reg .u32 test; ld.param.u32 test, [test_param]; However, when using inline PTX, the Using Inline PTX Assembly in CUDA (version 01) application note…

cuda inline-assembly ptxas

asked Mar 31 '12 at 14:51

Roger Dahl

15,132
8
62
82

2

votes

2 answers

Strange results for profiled executed instructions and issued instructions in Fermi GPU (GTX 580)

My kernel has the ptx version like this: .version 2.2 .target sm_20, texmode_independent .entry histogram( .param .u32 .ptr .global .align 4 histogram_param_0, .param .u32 .ptr .global .align 4 histogram_param_1 ) { .reg…

cuda opencl gpu gpgpu ptxas

asked Jul 06 '11 at 08:14

Zk1001

2,033
4
19
36

2

votes

1 answer

NVCC separate compilation with PTX output

Just to see what kind of code CUDA is generating I like to compile to ptx in addition to an object file. Since some of my loop unrolling can take quite a while I'd like to be able to compile *.cu→*.ptx→*.o instead of wasting time with both…

gcc cuda nvcc ptxas

asked Jan 21 '14 at 04:37

jozxyqk

16,424
12
91
180

2

votes

1 answer

Function properties for __internal_trig_reduction_slowpathd

At the moment I am trying to optimize some cuda kernels... If compile with the option --ptxas-options=-v I get the information about registers %co. In my case I always get some extra lines, which make no sense for me: ptxas : info : Compiling entry…

c cuda nvcc ptxas

asked Jun 05 '13 at 13:20

raspiede

179
1
1
11

1

vote

1 answer

Debugging inline PTX in Parallel Nsight

Is it possible to view the PTX registers when stepping through inline PTX in Parallel Nsight? I can set breakpoints on inline PTX and step through it, but hovering over the PTX registers does not show their values. I can turn on SASS and hovering on…

cuda inline-assembly nvcc nsight ptxas

asked Apr 03 '12 at 02:50

Roger Dahl

15,132
8
62
82

1

vote

1 answer

Setting 32 bit address size in inline PTX

I'm in the processing of converting PTX written as a separate file to inline PTX. In the separate PTX file, I was defining the ISA and target as follows: .version 1.2 .target sm_13 In the PTX file generated by the compiler, after having inlined the…

cuda inline-assembly nvcc ptxas

asked Mar 31 '12 at 18:38

Roger Dahl

15,132
8
62
82

1

vote

1 answer

How should i ovverride the GCC runtime compiler options for specific GPU

Using the GCC compiler and OpenMP programming, I’m now working on a project that offloads data to an Nvidia GPU. I needed some help in figuring out an issue . The default setup uses a virtual GPU that emulates the Kepler architecture (compute…

gcc gpu openmp hpc ptxas

asked Mar 17 '23 at 14:02

Charu Jain

36
6

1

vote

1 answer

OpenCL including header causes ptxas fatal: Unresolved extern function

What's wrong I'm trying to include some C code in my .cl code to call from the kernel. As far as I can tell, the compiler can find the .h files fine, but fails to include the implementation .c file. I'm using C# with OpenCL.NetCore source code is…

c# c include opencl ptxas

asked Dec 09 '22 at 21:17

TheINCGI

35
5

1

vote

2 answers

Interpreting the verbose output of ptxas, part II

This question is a continuation of Interpreting the verbose output of ptxas, part I . When we compile a kernel .ptx file with ptxas -v, or compile it from a .cu file with -ptxas-options=-v, we get a few lines of output such as: ptxas info :…

cuda verbose gpu-shared-memory gpu-constant-memory ptxas

asked May 16 '19 at 20:41

einpoklum

118,144
57
340
684

1

vote

1 answer

What is the correct way to support `shfl()` and `shfl_sync()` instructions?

From my understanding, CUDA 10.1 removed the shfl instructions: PTX ISA version 6.4 removes the following features: Support for shfl and vote instructions without the .sync qualifier has been removed for .targetsm_70 and higher. This support was…

cuda ptx ptxas

asked Apr 29 '19 at 21:57

Blizzard

1,117
2
11
28

Questions tagged [ptxas]