A Queue is a thing that accepts Command Buffers containing operations of a given type (given by the family flags). The commands submited to a Queue have a Submission Order, therefore they are subject to synchronization by Pipeline Barriers, Subpass Dependencies, and Events (while across queues Semaphore or beter has to be used).
There's one trick: COMPUTE
and GRAPHICS
can always implicitly accept TRANSFER
workload (even if the QueueFamilyProperties
do not list it. See this in Note below Specification of VkQueueFlagBits).
Transfer is for Copy and Blit commands. Sparse is something like paging; it allows to bind multiple Memory handles to a single Image, and it allows to re-bind different memory later too.
In the Specification, below given vkCmd*
command it always says which are the "Supported Queue Types".
Queue Family is a group of Queues that have special relation to themselves. Some things are restricted to a single Queue Family, such as Images (they have to be transfered between Queue Families) or Command Pool (creates Command Buffers only for consumption by the given Queue Family and no other). Theoretically on some exotic device there could be more Queue Families with the same Flags.
That's pretty much everything the Vulkan Specification guarantees. See an Issue with this at KhronosGroup/Vulkan-Docs#569
There are some vendor-specific materials given, e.g.:
The GPUs have asynchronous Graphics Engine(s), Compute Engine(s), and Copy\DMA Engine(s). The Graphics and Compute would of course contest the same Compute Units of the GPU.
They usually have only one Graphics Frontend. That is a bottleneck for graphics operations, so that means there's no point in using more than one Graphics Queue.
There are two modes of operation for Compute: Synchronous Compute (exposed as GRAPHICS|COMPUTE
family) and Async Compute (exposed as COMPUTE
-only family). The first is a safe choice. The second can give you about 10 % perf, but is more tricky and requires more effort. The AMD article suggests to always do the first as a baseline.
There can theoretically be as many Compute Queues as there are Compute Units on the GPU. But AMD argues there's no benefit to more than two Async Compute Queues and exposes that many. NVIDIA seems to go with the full number.
The Copy\DMA Engines (exposed as the TRANSFER
-only family) are primarily intended for CPU⇄GPU transfers. They would usually not achieve full throughput for an inside-GPU copy. So unless there's some driver magic, the Async Transfer Family should be used for CPU⇄GPU transfers (to reap the Async property, being able to do Graphics next to it unhindered). For inside-GPU copies it should be better for most cases to use the GRAPHICS|TRANSFER
family.