Questions tagged [gpu-cooperative-groups]

15 questions
4
votes
3 answers

Can I launch a cooperative kernel without passing an array of pointers?

The CUDA runtime API allows us to launch kernels using the variable-number-of-arguments triple-chevron syntax: my_kernel<<>>( first_arg, second_arg, and_as_many, as_we, want_to, etc, etc); but as regards…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
3
votes
1 answer

Should thread_block type be passed by reference?

Question When passing thread_group type objects to a device function, is there a preference for passing by reference vs passing by value? Is one of them "correct" What are the differences for each approach When should each approach be…
John Mansell
  • 624
  • 5
  • 16
2
votes
1 answer

The Unresolved extern function 'cudaCGGetIntrinsicHandle' strikes back

I'm building the examples for my cuda-api-wrappers repo. A while ago, I encountered a problem: Whenever I tried compiling a .cu file which code using the "cooperative groups" mechanism, I got an error message: ptxas fatal : Unresolved extern…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
2
votes
1 answer

Two consecutive kernels or whole-grid cooperative group synchronization?

Suppose I have two tasks to run on a GPU, the second of which relying on essentially all work by the first. Traditionally, I would essentially have to write these tasks as two separate kernels and schedule the second to run at some point after the…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
2
votes
1 answer

cooperative_groups::this_grid() causes any CUDA API call to return 'unknown error'

Following the same steps in CUDA samples to launch a kernel and sync across the grid using cooperative_groups::this_grid().sync() causes any CUDA API call to fails. While using cooperative_groups::this_thread_block().sync() works fine and gives…
ahmed
  • 59
  • 5
2
votes
1 answer

CUDA Cooperative Groups : Linking error

After reading about Cooperative Groups in CUDA 9, I've been trying synchronize at a grid level. I'm using Visual Studio 2017, a GTX 1060 and CUDA 9.1. I altered my code as follows: __global__ void ExplicitKernel_American(/* ... */) { int i =…
1
vote
2 answers

Deadlocks with cuda cooperative groups

In the CUDA Programming Guide in the section about Cooperative Groups, there is an example of grid-local synchronization: grid_group grid = this_grid(); grid.sync(); Unfortunately, I didn't found precise definition of grid.sync() behavior. Is it…
user8044236
1
vote
1 answer

How can using cooperative groups feature of CUDA in windows

My GPU is GeForce MX150, pascal architecture, CC. 6.1, CUDA 9.1, windows 10. Although my GPU is pascal but cooperative groups doesn't work. I want to use it for inter-block synchronization. I found my tcc mode doesn't active. I also found that …
0
votes
1 answer

Memory allocation is not permitted when running kernel with cudaLaunchCooperativeKernel and -rdc=true

An error "operation not permitted" is generated when running the following code. Is there anything I am missing? I'm running it with compute capabilities 7.5 and the command nvcc test.cu -rdc=true. It works without RDC. #include…
Pierre T.
  • 380
  • 1
  • 13
0
votes
1 answer

In CUDA, how can I get this warp's thread mask in conditionally executed code (in order to execute e.g., __shfl_sync or .shfl?

I'm trying to update some older CUDA code (pre CUDA 9.0), and I'm having some difficulty updating usage of warp shuffles (e.g., __shfl). Basically the relevant part of the kernel might be something like this: int f = d[threadIdx.x]; int warpLeader =…
sg_man
  • 763
  • 1
  • 6
  • 14
0
votes
1 answer

How to run cuda cooperative template kernel

I am trying to unsuccessfully launch template kernel as cooperative kernel in CUDA C++ , what am I doing wrong error Error cannot determine which instance of function template "boolPrepareKernel" is intended I try to invoke kernel like…
Jakub Mitura
  • 159
  • 1
  • 14
0
votes
1 answer

error: class "cooperative_groups::__v1::thread_block" has no member "is_valid"

We are trying to use some of the cooperative groups features in CUDA to write a small application. We are using Tesla V100 card with CUDA 11.0. But on using is_valid() method in thread_block the below error is observed: error: class…
Satyanvesh D
  • 323
  • 1
  • 4
  • 16
0
votes
1 answer

Why is cudaLaunchCooperativeKernel() returning not permitted?

So I am using GTX 1050 with a compute capability of 6.1 with CUDA 11.0. I need to use grid synchronization in my program so cudaLaunchCooperativeKernel() is needed. I have checked my device query so the GPU does have support for cooperative groups.…
0
votes
0 answers

CUDA kernel stops working when using cooperative groups grid sync() function

I was trying to implement some cuda kernels leveraging the cooperative groups feature when i encountered that the kernel wouldn't do anything at all. I boiled it down to a small example which can be seen below. As soon as i call the sync() function…
KoKlA
  • 898
  • 2
  • 11
  • 15
0
votes
0 answers

Why does nvcc refuse to link this simple cooperative-groups program?

Consider the following CUDA program, in a file named foo.cu: #include #include __global__ void my_kernel() { auto g = cooperative_groups::this_grid(); g.sync(); } int main(int, char **) { …
einpoklum
  • 118,144
  • 57
  • 340
  • 684