5

From K20 different streams becomes fully concurrent(used to be concurrent on the edge).

However My program need the old way. Or I need to do a lot of synchronization to solve the dependency problem.

Is it possible to switch stream management to the old way?

einpoklum
  • 118,144
  • 57
  • 340
  • 684
worldterminator
  • 2,968
  • 6
  • 33
  • 52
  • 1
    Can you give more information about what the problem is? By putting work A and B in different streams you are explicitly stating that A and B are independent, so what is the "dependency problem"? – Tom Feb 15 '13 at 14:40
  • @Tom I need to do A-B-C for each data. For data1 it is A1-B1-C1. for data2 it is A2-B2-C2. But A2 can not start unitl A1 ends. B and C has no such restriction. I require each time only one A, B, C be executed. So I design A[i-2],B[i-1],C[i] is executed concurrently(different stream). Without the old stream pattern. I can not do this. – worldterminator Feb 20 '13 at 14:27
  • You're fundamentally violating the programming model. Either put A1 and A2 in the same stream, or use cross stream synchronisation. Relying on the fact that pre-sm35 hardware was introducing false dependencies is foolish. – Tom Feb 21 '13 at 20:52

2 Answers2

6

CUDA C Programming Guide section on Asynchronous Current Execution

A stream is a sequence of commands (possibly issued by different host threads) that execute in order. Different streams, on the other hand, may execute their commands out of order with respect to one another or concurrently; this behavior is not guaranteed and should therefore not be relied upon for correctness (e.g., inter-kernel communication is undefined).

If the application relied on Compute Capability 2.* and 3.0 implementation of streams then the program violates the definition of streams and any change to the CUDA driver (e.g. queuing of per stream requests) or new hardware will break the program.

If you need a temporary workaround then I would suggest moving all work to a single user defined stream. This may impact performance but it is likely the only temporary workaround.

Greg Smith
  • 11,007
  • 2
  • 36
  • 37
  • This is not the answer I want. I know how stream works. But K20's stream is different from previous GPU. I want to know how to switch it back or other solutions. – worldterminator Feb 15 '13 at 01:33
  • 6
    The hardware and the driver do not support emulating CC 3.0 and early execution mode on CC 3.5. Your program violates the programming model. It is highly recommended that you fix you program as there is no guarantee that a future CUDA driver update that adheres to the definition of the programming model will not break your program on CC 3.0 and early devices. – Greg Smith Feb 15 '13 at 18:18
1

Can you express the kernel dependencies with cudaEvent_t objects?

The Streams and Concurrency Webinar shows some quick code snippets on how to use events. Some of the details of that presentation are only applicable to pre-Kepler hardware, but I'm assuming from the original question that you're familiar with how things have changed since Fermi now that there are multiple command queues.

Mr Fooz
  • 109,094
  • 6
  • 73
  • 101