Questions tagged [gpu-cooperative-groups]
15 questions
4
votes
3 answers
Can I launch a cooperative kernel without passing an array of pointers?
The CUDA runtime API allows us to launch kernels using the variable-number-of-arguments triple-chevron syntax:
my_kernel<<>>(
first_arg, second_arg, and_as_many, as_we, want_to, etc, etc);
but as regards…

einpoklum
- 118,144
- 57
- 340
- 684
3
votes
1 answer
Should thread_block type be passed by reference?
Question
When passing thread_group type objects to a device function, is there a preference for passing by reference vs passing by value?
Is one of them "correct"
What are the differences for each approach
When should each approach be…

John Mansell
- 624
- 5
- 16
2
votes
1 answer
The Unresolved extern function 'cudaCGGetIntrinsicHandle' strikes back
I'm building the examples for my cuda-api-wrappers repo. A while ago, I encountered a problem: Whenever I tried compiling a .cu file which code using the "cooperative groups" mechanism, I got an error message:
ptxas fatal : Unresolved extern…

einpoklum
- 118,144
- 57
- 340
- 684
2
votes
1 answer
Two consecutive kernels or whole-grid cooperative group synchronization?
Suppose I have two tasks to run on a GPU, the second of which relying on essentially all work by the first. Traditionally, I would essentially have to write these tasks as two separate kernels and schedule the second to run at some point after the…

einpoklum
- 118,144
- 57
- 340
- 684
2
votes
1 answer
cooperative_groups::this_grid() causes any CUDA API call to return 'unknown error'
Following the same steps in CUDA samples to launch a kernel and sync across the grid using cooperative_groups::this_grid().sync() causes any CUDA API call to fails. While using
cooperative_groups::this_thread_block().sync() works fine and gives…

ahmed
- 59
- 5
2
votes
1 answer
CUDA Cooperative Groups : Linking error
After reading about Cooperative Groups in CUDA 9, I've been trying synchronize at a grid level.
I'm using Visual Studio 2017, a GTX 1060 and CUDA 9.1.
I altered my code as follows:
__global__ void ExplicitKernel_American(/* ... */) {
int i =…

user3821901
- 95
- 6
1
vote
2 answers
Deadlocks with cuda cooperative groups
In the CUDA Programming Guide in the section about Cooperative Groups, there is an example of grid-local synchronization:
grid_group grid = this_grid();
grid.sync();
Unfortunately, I didn't found precise definition of grid.sync() behavior. Is it…
user8044236
1
vote
1 answer
How can using cooperative groups feature of CUDA in windows
My GPU is GeForce MX150, pascal architecture, CC. 6.1, CUDA 9.1, windows 10.
Although my GPU is pascal but cooperative groups doesn't work. I want to use it for inter-block synchronization.
I found my tcc mode doesn't active.
I also found that …

pedram64
- 37
- 1
- 7
0
votes
1 answer
Memory allocation is not permitted when running kernel with cudaLaunchCooperativeKernel and -rdc=true
An error "operation not permitted" is generated when running the following code. Is there anything I am missing?
I'm running it with compute capabilities 7.5 and the command nvcc test.cu -rdc=true. It works without RDC.
#include…

Pierre T.
- 380
- 1
- 13
0
votes
1 answer
In CUDA, how can I get this warp's thread mask in conditionally executed code (in order to execute e.g., __shfl_sync or .shfl?
I'm trying to update some older CUDA code (pre CUDA 9.0), and I'm having some difficulty updating usage of warp shuffles (e.g., __shfl).
Basically the relevant part of the kernel might be something like this:
int f = d[threadIdx.x];
int warpLeader =…

sg_man
- 763
- 1
- 6
- 14
0
votes
1 answer
How to run cuda cooperative template kernel
I am trying to unsuccessfully launch template kernel as cooperative kernel in CUDA C++ , what am I doing wrong
error
Error cannot determine which instance of function template "boolPrepareKernel" is intended
I try to invoke kernel like…

Jakub Mitura
- 159
- 1
- 14
0
votes
1 answer
error: class "cooperative_groups::__v1::thread_block" has no member "is_valid"
We are trying to use some of the cooperative groups features in CUDA to write a small application. We are using Tesla V100 card with CUDA 11.0.
But on using is_valid() method in thread_block the below error is observed:
error: class…

Satyanvesh D
- 323
- 1
- 4
- 16
0
votes
1 answer
Why is cudaLaunchCooperativeKernel() returning not permitted?
So I am using GTX 1050 with a compute capability of 6.1 with CUDA 11.0. I need to use grid synchronization in my program so cudaLaunchCooperativeKernel() is needed. I have checked my device query so the GPU does have support for cooperative groups.…
0
votes
0 answers
CUDA kernel stops working when using cooperative groups grid sync() function
I was trying to implement some cuda kernels leveraging the cooperative groups feature when i encountered that the kernel wouldn't do anything at all. I boiled it down to a small example which can be seen below. As soon as i call the sync() function…

KoKlA
- 898
- 2
- 11
- 15
0
votes
0 answers
Why does nvcc refuse to link this simple cooperative-groups program?
Consider the following CUDA program, in a file named foo.cu:
#include
#include
__global__ void my_kernel() {
auto g = cooperative_groups::this_grid();
g.sync();
}
int main(int, char **) {
…

einpoklum
- 118,144
- 57
- 340
- 684