8

I am using Vulkan graphics API (via BGFX) to render. And I have been measuring how much (wall-clock) time my calls take.

What I do not understand is that vkAcquireNextImageKHR() is always fast, and never blocks. Even though I disable the time-out and use a semaphore to wait for presentation.

The presentation is locked to a 60Hz display rate, and I see my main-loop indeed run at 16.6 or 33.3 ms.

Shouldn't I see the wait-time for this display rate show up in the length of the vkAcquireNextImageKHR() call?

The profiler measures this call as 0.2ms or so, and never a substantial part of a frame.

VkResult result = vkAcquireNextImageKHR(
    m_device
  , m_swapchain
  , UINT64_MAX
  , renderWait
  , VK_NULL_HANDLE
  , &m_backBufferColorIdx
);

Target hardware is a handheld console.

Bram
  • 7,440
  • 3
  • 52
  • 94

1 Answers1

16

The whole purpose of Vulkan is to alleviate CPU bottlenecks. Making the CPU stop until the GPU is ready for something would be the opposite of that. Especially if the CPU itself isn't actually going to use the result of this operation.

As such, all the vkAcquireNextImageKHR function does is let you know which image will be made available to you next. This is the minimum that needs to happen in order for you to be able to use that image (for example, by building command buffers that reference the image in some way). However, that image is not yet available to you.

This is why this function requires you to provide a semaphore and/or a fence: so that the process which consumes the image can wait for the image to be made available.

If the process which consumes the image is just a bunch of commands in a command buffer (ie: something you submit with vkQueueSubmit), you can simply have that batch of work wait on the semaphore given to the acquire operation. That means all of the waiting happens in the GPU. Where it belongs.

The fence is there if you (for some reason) want the CPU to be able to wait until the acquire is done. But Vulkan, as an explicit, low-level API, forces you to explicitly say that this is what you want (and it almost never is what you want).

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Thanks. I misread the docs, the semaphore will be signaled, not waited upon. So in that case, I expect my block to show up in `vkQueueSubmit()` that uses that semaphore in `pWaitSemaphores` but strangely, it is not showing up there, either. I see it in a `vkQueueWaitIdle()` instead. – Bram Feb 26 '20 at 18:43
  • 3
    @Bram: Queue submit operations don't wait on semaphores; the *GPU* waits on semaphores. That's why they're GPU constructs and not POSIX mutexes or somesuch. GPU wait operations shouldn't force CPU waits. Also, you should basically never call `vkQueueWaitIdle`. – Nicol Bolas Feb 26 '20 at 18:44
  • 1
    One tricky bit to consider is that if you only ever use semaphores, then the Vulkan driver will happily let you queue up an arbitrary number of frame submissions. So your display could be on frame 5 and your CPU is working on generating frame 900. That's the value in associating a fence with each image, but not checking the fence until that image comes around again from the acquire. – Jherico Feb 26 '20 at 20:52
  • @NicolBolas Ah, thanks. You have been very helpful. So, yeah. BGFX uses vkQueueWaitIdle() a lot, every time it switches to a new `VkFramebuffer`. I tried taking those waits out, but that causes GPU crashes. https://github.com/bkaradzic/bgfx/blob/dba8b8efef784b235c0a008267459c258bc0fa79/src/renderer_vk.cpp#L5875 – Bram Feb 26 '20 at 21:10
  • @Jherico: It's pretty much impossible to render a non-static scene without, at some point, doing a GPU/CPU sync. If you change the contents of memory or need to alter a descriptor set, you're going to have to prevent changing memory/sets that the GPU is using. And that requires a sync. But it only happens as you need it and where you need it, rather than being a built-in part of some API function. – Nicol Bolas Feb 26 '20 at 21:14
  • 2
    @Bram: "*I tried taking those waits out, but that causes GPU crashes.*" If they're relying on that for synchronizing other things too (like memory/descriptor accesses), then just yanking them out won't work. My point is that in a well-constructed application, the CPU should never be waiting for a queue to idle, as that represents losing GPU performance. The CPU may wait for a particular submission to complete, but that would be a wait on a *fence*, not for a queue to stop doing *anything*. – Nicol Bolas Feb 26 '20 at 21:16
  • @NicolBolas granted, but most of my direct experience has been working with Vulkan sample code where it's very easy to create an entire (toy) application that has no per-frame sync points. – Jherico Feb 26 '20 at 23:25
  • This makes sense but I couldn't figure out by myself that [vkAcquireNextImageKHR](https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/vkAcquireNextImageKHR.html) does not block. Should I assume that API calls don't block CPU unless explicitly specified? What really confuses my is the `timeout` parameter... – tuket Mar 07 '22 at 21:21
  • One of the possible return codes for [vkAcquireNextImageKHR](https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/vkAcquireNextImageKHR.html) is VK_TIMEOUT, which doesn't make sense to me unless the function can indeed block CPU execution :S – tuket Mar 07 '22 at 21:24
  • 2
    @tuket: It does not block until the next image is available. But it does have to block until the display engine can figure out what the next image actually will be. – Nicol Bolas Mar 07 '22 at 21:32
  • Actually, both [vkWaitSemaphores](https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/vkWaitSemaphoresKHR.html) and [vkWaitForFences](https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/vkWaitForFences.html) have a timeout parameter so I'm not sure what is the point of having another one. – tuket Mar 07 '22 at 21:33
  • @NicolBolas Ah, makes sense, thank you. I have found the [documentation that talks about this](https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/chap33.html#_wsi_swapchain) "If the specified timeout period expires before an image is acquired, vkAcquireNextImageKHR returns VK_TIMEOUT" "The presentation engine may not have finished reading from the image at the time it is acquired, so the application must use semaphore and/or fence to ensure..." – tuket Mar 07 '22 at 21:42