Here is a Vulkan rendering setup:
- Submit CB1 - contains render pass A with X draw calls.
- Submit CB2 - contains render pass B with Y draw calls where one of the draw calls samples from image which is render target of render pass A.
- During submission of CB2, a semaphore, which is shared with some external GPU API, signaling inserted to make sure CB2 execution is done before the render target of render pass B is used further (By CUDA in this case).This step is set correct and it is clear to me how it works.
All this happens on the same queue and in the above specified order. Render pass A and B share MSAA image,but each has unique color attachment into which MSAA is resolved.
My question is what is the best way to synchronize between CB1 finishing writing to render pass A RT,and one or more draw calls in CB2 sampling from that RT in shader during render pass B execution?
Based on some suggestion I received on Khronos Vulkan Slack group I tried synchronization via subpass dependencies.
Render pass A dependency setup :
VkSubpassDependency dependencies[2] = {};
dependencies[0].srcSubpass = VK_SUBPASS_EXTERNAL;
dependencies[0].dstSubpass = 0;
dependencies[0].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependencies[0].dstStageMask = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
dependencies[0].srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;
dependencies[0].dstAccessMask = VK_ACCESS_INPUT_ATTACHMENT_READ_BIT;
dependencies[0].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;
VkAttachmentReference colorReferences[2] = {};
colorReferences[0] = { 0, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL }; //this one for MSAA attachment
colorReferences[1] = { 1, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL };
Render pass B dependency setup :
VkSubpassDependency dependencies[2] = {};
dependencies[0].srcSubpass = VK_SUBPASS_EXTERNAL;
dependencies[0].dstSubpass = 0;
dependencies[0].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependencies[0].dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependencies[0].srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;
dependencies[0].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
dependencies[0].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;
VkAttachmentReference colorReferences[2] = {};
colorReferences[0] = { 0, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL };
colorReferences[1] = { 1, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL };
The above solution seems to be working. But I am not sure I have implicit synchronization guarantee about it. In this discussion one of the answers states
The only images where that isn’t the case is dependencies for render targets. And those are treated specially by subpasses anyway. Each subpass says what it is writing and to where, so the system has the information to build those memory dependencies explicitly.
Also in this article the author writes:
NOTE: Frame buffer operations inside a render pass happen in API-order, of course. This is a special exception which the spec calls out.
But in this SO question, the answer is that a sync between command buffers must be done, with events in that case.
So another option I have tried was to insert pipeline memory barrier during CB2 recording for images which are render targets in previously submitted CB (render pass A),before beginning of render pass B recording:
CB2 recording:
BeginCommandBuffer(commandBuffer);
...
if (vulkanTex->isRenderTarget)//for each sampler in this pass
{
VkImageMemoryBarrier imageMemoryBarrier = CreateImageBarrier();
imageMemoryBarrier.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
imageMemoryBarrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
imageMemoryBarrier.image = vulkanTex->image;
VkImageSubresourceRange range = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 };
imageMemoryBarrier.subresourceRange = range;
vkCmdPipelineBarrier(
commandBuffer,
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
0,
0, nullptr,
0, nullptr,
1, &imageMemoryBarrier);
}
VkRenderPassBeginInfo renderPassBeginInfo = {};
...
.....
Of course in this scenario, I can set subpass dependency same for all render passes to be like Render pass B dependency setup.This one also works. But it requires from me recording more commands into CBs. So,which way is correct(given subpass dependency option is valid) and most optimal in terms of hardware efficiency?