Why some commands can be recorded only outside of a render pass?

Question

I don't know is it an API feature (I'm almost sure it's not) or a GPU specifics, but why, for example, vkCmdWaitEvents can be recorded inside and outside of a render pass, but vkCmdResetEvent can be recorded only outside? The same applies to other commands.

Nicol Bolas · Accepted Answer · 2019-05-18T18:15:58.470

When it comes to event setting in particular, they play havoc with how the render pass model interacts with tile-based renderers.

Recall that the whole point of the complexity of the render pass model is to service the needs of tile-based renderers (TBRs). When a TBR encounters a complex series of subpasses, the way it wants to execute them is as follows.

It does all of the vertex processing stages for all of the rendering commands for all of the subpasses, all at once, storing the resulting vertex data in a buffer for later consumption. Then for each tile, it executes the rasterization stages for each subpass on the primitives that are involved in the building of that tile.

Note that this is the ideal case; specific things can make it fail to various degrees, but even then, it tends to fail in batches, where you do can execute several subpasses of a render pass like this.

So let's say you want to set an event in the middle of a subpass. OK... when does that actually happen? Remember that set-event command actually sets the event after all of the preceeding commands have completed. In a TBR, if everything is proceeding as above, when does it get set? Well ideally, all vertex processing for the entire renderpass is supposed to happen before any rasterization, so setting the event has to happen after the vertex processing is done. And all rasterization processing happens on a tile-by-tile basis, processing whichever primitives overlap that tile. Because of the fragmented rendering process, it's difficult to know when an individual rendering command has completed.

So the only place the set-event call could happen is... after the entire renderpass has completed. That is obviously not very useful.

The alternative is to have the act of issuing a ckCmdSetEvent call fundamentally reshape how the implementation builds the entire render pass. To break up the subpass into the stuff that happened before the event and the stuff that happens after the event.

But the reason why VkRenderPass is so big and complex, the reason why VkPipelines have to reference a specific subpass of a render pass, and the reason why vkCmdPipelineBarrier within a render pass requires you to specify a subpass self-dependency, is so that a TBR implementation can know up front when and where it will have to break the ideal TBR rendering scheme. Having a function introduce that breakup without forewarning works against this idea.

Furthermore, Vulkan is designed so that, if something is going to have to be implemented highly inefficiently, then it is either impossible to do directly or the API really makes it look really inefficient. vkCmd(Re)SetEvent cannot be efficiently implemented within a render pass on TBR hardware, so you can't do it period.

Note that vkCmdWaitEvents doesn't have this problem, because the system knows that the wait is waiting on something outside of a render pass. So it's just some particular stage that has to wait on the event to complete. If it's a vertex stage doing the waiting, it's easy enough to set that wait at the beginning of that command's processing. If it's a fragment stage, it can just insert the wait at the beginning of all rasterization processing; it's not the most efficient way to handle it, but since all vertex processing has executed, odds are good that the event has been set by then.

For other kinds of commands, recall that the dependency graph of everything that happens within a render pass is defined within VkRenderPass itself. The subpass dependency graph is there. You can't even issue a normal vkCmdPipelineBarrier within a render pass, not unless that subpass has an explicit self-dependency in the subpass dependency graph.

So what good would it be to issue a compute shader dispatch or a memory transfer operation in the middle of a subpass if you cannot wait for the operation to finish in that subpass or a later one? If you can't wait on the operation to end, then you cannot use its results. And if you can't use its results... you may as well have issued it before the render pass.

And the reason you can't have other dependencies goes back to TBRs. The dependency graph is an inseparable part of the render pass to allow TBRs to know up-front what the relationship between subpasses is. That allows them to know whether they can build their ideal renderer, and when/where that might break down.

Since the TBR model of render passes makes such waiting impractical, there's no point in allowing you to issue such commands.

@Nical Bolas, I highly appreciate how you explain complex topics. Thank you very much. — nikitablack, May 15 '19 at 13:02

score 3 · Answer 2 · edited May 14 '19 at 22:20

3

Because a renderpass is a special construct that implies focusing work solely on the framebuffer.

In addition each of the subpasses are allowed to be run in parallel unless they have an explicit dependency between them.

This has an effect on how they would need to be synchronized to other instructions in the other subpasses.

Doing copies dominates use of the memory bus and would stall render work that depends on it. Doing that inside the renderpass creates a big gpu bubble that can be easily resolved by putting it outside and making sure its finished by the time you start the renderpass.

Some hardware also has dedicated copy units that is separate from the graphics hardware so the less synchronizing you need to do between them the better.

edited May 14 '19 at 22:20

Nicol Bolas

449,505
63
781
982

answered May 14 '19 at 09:55

ratchet freak

47,288
5
68
106

ad copies: If they are recorded synchronous like this, that makes them likely just graphics operation on a quad under the hood. That would mean switching current framebuffer with the target of the copy (so basically starting another Render Pass Instance in middle of Render Pass Instance). There could be some tile-based copies which would be good fit with render pass design, but not with the regular `vkCmdCopy*` commands (e.g. like `vkCmdResolveImage` vs `pResolveAttachments`). – krOoze May 14 '19 at 13:34
This has effect on how -> This has **an** effect on how – Krupip May 14 '19 at 18:54

Why some commands can be recorded only outside of a render pass?

2 Answers2