Highest Voted 'cuda-streams' Questions

10

votes

2 answers

CUDA streams not overlapping

I have something very similar to the code: int k, no_streams = 4; cudaStream_t stream[no_streams]; for(k = 0; k < no_streams; k++) cudaStreamCreate(&stream[k]); cudaMalloc(&g_in, size1*no_streams); cudaMalloc(&g_out, size2*no_streams); for (k =…

cuda cuda-streams

asked May 20 '11 at 10:10

pmcr

135
1
2
6

9

votes

2 answers

Multiple host threads launching individual CUDA kernels

For my CUDA development, I am using a machine with 16 cores, and 1 GTX 580 GPU with 16 SMs. For the work that I am doing, I plan to launch 16 host threads (1 on each core), and 1 kernel launch per thread, each with 1 block and 1024 threads. My goal…

cuda cuda-streams

asked Sep 06 '12 at 05:56

gmemon

2,573
5
32
37

8

votes

1 answer

How to reduce CUDA synchronize latency / delay

This question is related to using cuda streams to run many kernels In CUDA there are many synchronization commands cudaStreamSynchronize, CudaDeviceSynchronize, cudaThreadSynchronize, and also cudaStreamQuery to check if streams are empty. I noticed…

concurrency cuda latency synchronize cuda-streams

asked Aug 14 '12 at 13:48

shadow

141
1
7

7

votes

2 answers

CUDA Dynamic Parallelism, bad performance

We are having performance issues when using the CUDA Dynamic Parallelism. At this moment, CDP is performing at least 3X slower than a traditional approach. We made the simplest reproducible code to show this issue, which is to increment the value of…

c++ cuda dynamic-parallelism cuda-streams

asked Jul 19 '17 at 21:10

Cristobal Navarro

326
2
12

5

votes

1 answer

What is the relationship between NVIDIA MPS (Multi-Process Server) and CUDA Streams?

Glancing from the official NVIDIA Multi-Process Server docs, it is unclear to me how it interacts with CUDA streams. Here's an example: App 0: issues kernels to logical stream 0; App 1: issues kernels to (its own) logical stream 0. In this case, 1)…

cuda gpu nvidia cuda-streams

asked Mar 07 '18 at 23:35

Covi

1,331
1
14
17

5

votes

2 answers

Are CUDA streams device-associated? And how do I get a stream's device?

I have a CUDA stream which someone handed to me - a cudaStream_t value. The CUDA Runtime API does not seem to indicate how I can obtain the index of the device with which this stream is associated. Now, I know that cudaStream_t is just a pointer to…

cuda multi-gpu cuda-streams

asked Jul 17 '15 at 11:28

einpoklum

118,144
57
340
684

5

votes

2 answers

Let nvidia K20c use old stream management way?

From K20 different streams becomes fully concurrent(used to be concurrent on the edge). However My program need the old way. Or I need to do a lot of synchronization to solve the dependency problem. Is it possible to switch stream management to the…

concurrency cuda cuda-streams

asked Feb 11 '13 at 09:53

worldterminator

2,968
6
33
52

4

votes

1 answer

What is the difference between Nvidia Hyper Q and Nvidia Streams?

I always thought that Hyper-Q technology is nothing but the streams in GPU. Later I found I was wrong(Am I?). So I was doing some reading about Hyper-Q and got confused more. I was going through one article and it had these two statements: A.…

cuda nvidia gpgpu cuda-streams

asked May 22 '19 at 05:18

sandeep.ganage

1,409
2
21
47

4

votes

1 answer

How to make multi CUBLAS APIs (eg. cublasDgemm) really execute concurrently in multi cudaStream

I want to make two CUBLAS APIs(eg.cublasDgemm) really execute concurrently in two cudaStreams. As we know, the CUBLAS API is asynchronous,level 3 routines like cublasDgemm don't block the host,that means the following codes (in default cudaStream)…

concurrency cuda cublas cuda-streams

asked Dec 30 '16 at 03:40

Yangsong Zhang

71
6

4

votes

1 answer

Is GTX 680 Capable of Concurrent Data Transfer

I expected that GTX 680 (which is one of the latest version of GPUs) is capable of concurrent data transfer (concurrent data transfer in both direction). But when I run cuda SDK "Device Query", the test result of the term "Concurrent copy and…

cuda gpu cuda-streams

asked Aug 27 '12 at 23:25

Blue_Black

307
1
3
11

3

votes

1 answer

Can we overlap compute operation with memory operation without pinned memory on CPU?

I`m trying to overlap the computation and memory operation with HuggingFace SwitchTransformer. Here’s a detailed explanation. The memory operation is for data movement from CPU to GPU, and its size is 4MB per block. The number of blocks is variable…

pytorch cuda cuda-streams

asked Apr 26 '23 at 05:21

Ryan

73
7

3

votes

1 answer

What's the capacity of a CUDA stream (=queue)?

A CUDA stream is a queue of tasks: memory copies, event firing, event waits, kernel launches, callbacks... But - these queues don't have infinite capacity. In fact, empirically, I find that this limit is not super-high, e.g. in the thousands, not…

cuda cuda-streams

asked Jun 24 '22 at 19:29

einpoklum

118,144
57
340
684

3

votes

0 answers

Execute another model in parallel to a model's forward pass with PyTorch

I am trying to make some changes to the ResNet-18 model in PyTorch to invoke the execution of another auxiliary trained model which takes in the ResNet intermediate layer output at the end of each ResNet block as an input and makes some auxiliary…

machine-learning deep-learning pytorch resnet cuda-streams

asked Aug 28 '19 at 22:05

jallikattu

31
2

3

votes

1 answer

Concurrency of one large kernel with many small kernels and memcopys (CUDA)

I am developing a Multi-GPU accelerated Flow solver. Currently I am trying to implement communication hiding. That means, while data is exchanged the GPU computes the part of the mesh, that is not involved in communication and computes the rest of…

c++ cuda cuda-streams

asked Jul 16 '19 at 16:15

Lenz

81
5

3

votes

5 answers

Get rid of busy waiting during asynchronous cuda stream executions

I looking for a way how to get rid of busy waiting in host thread in fallowing code (do not copy that code, it only shows an idea of my problem, it has many basic bugs): cudaStream_t steams[S_N]; for (int i = 0; i < S_N; i++) { …

cuda cuda-streams busy-loop

asked Feb 24 '11 at 16:12

kokosing

5,251
5
37
50

Questions tagged [cuda-streams]