I wanna use CUDA stream in Pytorch to parallel some computations, but I don't know how to do it. For instance, if there's 2 tasks, A and B, need to be parallelized, I wanna do the following things:
stream0 = torch.get_stream()
stream1 = torch.get_stream()
with torch.now_stream(stream0):
// task A
with torch.now_stream(stream1):
// task B
torch.synchronize()
// get A and B's answer
How can I achieve the goal in real python code?