Questions tagged [cuda-events]

15 questions
6
votes
1 answer

cudaEventSynchronize vs cudaDeviceSynchronize

I am new to CUDA and got a little confused with cudaEvent. I now have a code sample that goes as follows: float elapsedTime; cudaEvent_t start,…
Bojian Zheng
  • 2,167
  • 3
  • 13
  • 17
5
votes
2 answers

cudaStreamWaitEvent does not seem to wait

I am attempting to write a small demo program that has two cuda streams progressing and, governed by events, waiting for each other. So far this program looks like this: // event.cu #include #include #include…
Markus-Hermann
  • 789
  • 11
  • 24
4
votes
1 answer

Will cudaStreamWaitEvent block the host?

I understand that cudaEventSynchronize will block the host until the event has been triggered. However, what about cudaStreamWaitEvent? Will cudaStreamWaitEvent block only the specified stream whereas the host will proceed, or the host will be…
Hailiang Zhang
  • 17,604
  • 23
  • 71
  • 117
3
votes
0 answers

What does a slice mean in cuda?

I'm a new on cuda programming. I have to GPU profiling using the nvprof about my application. I find a metric l2_subp0_write_sector_misses that means number of write requests sent to DRAM from slice 0 of L2 cache. But I don't know what does a slice…
kh.chung
  • 53
  • 1
  • 4
2
votes
1 answer

Non-blocking synchronization of streams in CUDA?

Is it possible to synchronize two CUDA streams without blocking the host? I know there's cudaStreamWaitEvent, which is non-blocking. But what about the creation and destruction of the events using cudaEventCreate and cudaEventDestroy. The…
spfrnd
  • 886
  • 8
  • 15
2
votes
2 answers

Can a CUDA event be fired from device-side code?

Is there any way to fire an event (for benchmarking purposes, similar to cudaEvents in the CPU code) from a device kernel in CUDA? E.g. suppose I would like to measure the time passed from kernel start to the first thread ever that starts a…
AGer
  • 31
  • 3
1
vote
3 answers

How can I have a CUDA stream for not-yet-scheduled work? (i.e. user-event-like pattern)

I have some work I want to do on a CUDA stream, say a kernel K, which depends on previous work that needs to be done on the CPU. The exact details of the CPU work is not something that's known to me when I'm scheduling K; I just want K not to start…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
1 answer

Is there a way to block and unblock a CUDA stream arbitrarily?

I need to pause the execution of all calls in a stream from a certain point in one part of the program until another part of the program decides to unpause this stream at an arbitrary time. This is the requirement of the application I'm working on,…
surabax
  • 15
  • 5
1
vote
1 answer

How can I unset a CUDA event?

I have a processing loop on the host, where I record an event in a GPU stream. Then another stream waits for that event (waits for event's state "set" or "true"). Will this function (cudaStreamWaitEvent) unset this event (so, switching it to "unset"…
psihodelia
  • 29,566
  • 35
  • 108
  • 157
1
vote
1 answer

Wait for event in subsequent stream

I am trying to implement the following kind of pipeline on the GPU with CUDA: I have four streams with each a Host2Device copy, a kernel call and a Device2Host copy. However, the kernel calls have to wait for the Host2Device copy of the next stream…
Nico Schertler
  • 32,049
  • 4
  • 39
  • 70
0
votes
1 answer

What is cuEventRecord guaranteed to do if it gets the default-stream's handle?

Suppose I call cuEventRecord(0, my_event_handle). cuEventRecord() requires the stream and the event to belong to the same context. Now, one can interpret the 0 as "the default stream in the appropriate context" - the requirements are satisfied and…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
0
votes
1 answer

Reusing cudaEvent to serialize multiple streams

Suppose I have a struct: typedef enum {ON_CPU,ON_GPU,ON_BOTH} memLocation; typedef struct foo *foo; struct foo { cudaEvent_t event; float *deviceArray; float *hostArray; memLocation arrayLocation; }; a function: void…
Jacob Faib
  • 1,062
  • 7
  • 22
0
votes
1 answer

Recording elapsed time of CUDA kernels with cudaEventRecord() for multi-GPU program

I have a sparse triangular solver that works with 4 Tesla V100 GPUs. I completed implementation and all things work well in terms of accuracy. However, I am using a CPU timer to calculate elapsed time. I know that the CPU timer is not the perfect…
0
votes
1 answer

Asynchronous behavior of CUDA events within a CUDA stream

This question is about notion of a CUDA stream (Stream) and the apparent anomaly with CUDA events (Event) recorded on a stream. Consider the following code demonstrating this anamoly, cudaEventRecord(eventStart, stream1) kernel1<<<...,…
kesari
  • 536
  • 1
  • 6
  • 16
0
votes
1 answer

Is cudaEventRecord affected by the identity of the current device?

cudaEventRecord takes an event ID and a stream ID as parameters. The Runtime API reference does not say whether the stream is required to be associated with the current device - and I can't test whether that's the case since I only have one GPU at…
einpoklum
  • 118,144
  • 57
  • 340
  • 684