How to use CUDA stream in Pytorch?

Question

I wanna use CUDA stream in Pytorch to parallel some computations, but I don't know how to do it. For instance, if there's 2 tasks, A and B, need to be parallelized, I wanna do the following things:

stream0 = torch.get_stream()
stream1 = torch.get_stream()
with torch.now_stream(stream0):
    // task A
with torch.now_stream(stream1):
    // task B
torch.synchronize()
// get A and B's answer

How can I achieve the goal in real python code?

score 19 · Accepted Answer · edited Aug 14 '22 at 19:23

19

s1 = torch.cuda.Stream()
s2 = torch.cuda.Stream()
# Initialise cuda tensors here. E.g.:
A = torch.rand(1000, 1000, device = 'cuda')
B = torch.rand(1000, 1000, device = 'cuda')
# Wait for the above tensors to initialise.
torch.cuda.synchronize()
with torch.cuda.stream(s1):
    C = torch.mm(A, A)
with torch.cuda.stream(s2):
    D = torch.mm(B, B)
# Wait for C and D to be computed.
torch.cuda.synchronize()
# Do stuff with C and D.

edited Aug 14 '22 at 19:23

KetZoomer

2,701
3
15
43

answered Sep 25 '18 at 21:31

user13243542345452

348
2
5

2

It's only partially true that "torch.cuda.synchronize()" wait for C and D. It waits to everything wroks submitted any stream in the device including "C" and "D" You can check sources that torch.cuda.syncronize() lead to the call cudaDeviceSyncronize() inside https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDAFunctions.cpp and this is description of the alst routine: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g10e20b05a95f638a4071a655503df25d – Konstantin Burlachenko Mar 30 '21 at 22:32

How to use CUDA stream in Pytorch?

1 Answers1

Linked