Directed graphs whose nodes are CUDA operations (kernel launches, memory copies etc.) and edges are dependencies. They are used to schedule multiple inter-dependent operations for execution on GPUs with a single API call.
Questions tagged [cuda-graphs]
18 questions
3
votes
1 answer
Catching an exception thrown from a callback in cudaLaunchHostFunc
I want to check for an error flag living in managed memory that might have been written by a kernel running on a certain stream. Depending on the error flag I need to throw an exception.
I would simply sync this stream and check the flag from the…

Raul
- 41
- 4
3
votes
1 answer
Call graphs for CUDA
I am trying to generate call graphs for a code that I have in CUDA with egypt but the usual way doesn't seem to work (since nvcc doesn't have any flag that can do the same thing as -fdump-rtl-expand).
More details :
I have a really large code (of…

Kostis
- 447
- 4
- 13
2
votes
2 answers
Using multi streams in cuda graph, the execution order is uncontrolled
I am using cuda graph stream capture API to implement a small demo with multi streams. Referenced by the CUDA Programming Guide here, I wrote the complete code. In my knowledge, kernelB should execute on stream1, but with nsys I found kernelB is…

poohRui
- 613
- 5
- 9
2
votes
1 answer
How do the nodes in a CUDA graph connect?
CUDA graphs are a new way to synthesize complex operations from multiple operations. With "stream capture", it appears that you can run a mix of operations, including CuBlas and similar library operations and capture them as a singe…

MSalters
- 173,980
- 10
- 155
- 350
1
vote
1 answer
Using a loop in a CUDA graph
I have kernel A, B, and C which need to be executed sequentially.
A->B->C
They are executed in a while loop until some condition will be met.
while(predicate) {
A->B->C
}
The while loop may be executed from 3 to 2000 times - information about a…

Jakub Mitura
- 63
- 7
1
vote
2 answers
cudaGraph: Multi-threaded stream capturing causes errors only when run in cuda-memcheck
I have a program where multiple host threads try to capture a cuda graph and execute it.
It produces the correct results, but it cannot be run with cuda-memcheck.
When run with cuda-memcheck, the following error appears.
Program hit…

Abator Abetor
- 2,345
- 1
- 10
- 12
1
vote
5 answers
What is the use of task graphs in CUDA 10?
CUDA 10 added runtime API calls for putting streams (= queues) in "capture mode", so that instead of executing, they are returned in a "graph". These graphs can then be made to actually execute, or they can be cloned.
But what is the rationale…

einpoklum
- 118,144
- 57
- 340
- 684
0
votes
1 answer
Behavior of cudaGraphInstantiateFlagUseNodePriority
My understanding of cudaGraphInstantiateFlagUseNodePriority is to prioritize the kernel calls.
i.e. we have three independent kernels in cudaGraph first, second & third, and let each kernel waits for 1s and print its name.
If we set kernel graph…
0
votes
1 answer
Is it possible to execute more than one CUDA graph's host execution node in different streams concurrently?
Investigating possible solutions for this problem, I thought about using CUDA graphs' host execution nodes (cudaGraphAddHostNode). I was hoping to have the option to block and unblock streams on the host side instead of the device side with the wait…

surabax
- 15
- 5
0
votes
1 answer
What should I set the flags field of CUDA_BATCH_MEM_OP_NODE_PARAMS?
The CUDA graph API exposes a function call for adding a "batch memory operations" node to a graph:
CUresult cuGraphAddBatchMemOpNode (
CUgraphNode* phGraphNode,
CUgraph hGraph,
const CUgraphNode* dependencies,
size_t numDependencies,…

einpoklum
- 118,144
- 57
- 340
- 684
0
votes
1 answer
What type should be pointed to for the result of cuDeviceGetGraphMemAttribute()?
cuDeviceGetGraphMemAttribute() takes a void pointer to a result variable. But - what type does it expect the pointed-to value to be? The documentation (for CUDA v12.0) doesn't say. I'm guessing it's an unsigned 64-bit type, but I want to make sure.

einpoklum
- 118,144
- 57
- 340
- 684
0
votes
1 answer
How can I tell whether a copy-node search failed, or whether my node or graph are invalid?
Consider the CUDA graphs API function cuFindNodeInClone(). The documentation says, that it:
Returns:
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE
This seems problematic to me. How can I tell whether the search failed (e.g. because there is no copy of…

einpoklum
- 118,144
- 57
- 340
- 684
0
votes
1 answer
simple cuda graph example doesn't product expected result
I am testing out cuda graphs. My graph is as follows.
the code for this is as follows
#include
#include
#include
#include
#include
#define NumThreads 20
#define NumBlocks 1
template

M46f988b814
- 31
- 5
0
votes
1 answer
Error with a captured CUDA graph and asynchronous memory allocations in a loop
I am trying to implement a cuda graph experiment. There are three kernels, kernel_0, kernel_1, and kernel_2. They will be executed sequentially and have dependencies. Right now I am going to only capture kernel_1. These are my code:
#include…

kingwales
- 129
- 8
0
votes
1 answer
CUDA Graph Problem: Results not computed for the first iteration
I am trying to utilize CUDA Graphs for the computation of Fast Fourier Transform (FFT) using CUDA's cuFFT APIs.
I modified the sample FFT code present on Github into the following FFT code using CUDA Graphs:
#include
#include…

skm
- 5,015
- 8
- 43
- 104