How to write to the image directly by CPU when load it in Vulkan?

Question

In Direct3D12, you can use "ID3D12Resource::WriteToSubresource" to enable zero-copy optimizations for UMA adapters.

What is the equivalent of "ID3D12Resource::WriteToSubresource" in Vulkan?

What is a "near zero-copy"?? Is that like... one copy? Or are we talking non-integer amount of copies? — krOoze, Jan 15 '19 at 12:55
In Vulkan, you have to write to a buffer and copy from the buffer to a image. — Hanetaka Chou, Jan 16 '19 at 08:29
But In D3D, you can write to the image directly. The driver knows the addressing scheme of the Optimal tiling. So the Driver can take the data which you pass and write to the image according to the addressing scheme in the function "ID3D12Resource::WriteToSubresource". — Hanetaka Chou, Jan 16 '19 at 08:32
Vulkan also knows the addressing scheme. You can copy from your linear buffer (which can be just good old RAM) to the optimally tiled image. The difference here is that Vulkan (unextended version) forces you to make the buffer allocation through Vulkan, and does not directly take your data pointer. But I cover that in the answer. — krOoze, Jan 16 '19 at 16:19
In D3D, the driver writes to the image directly using the data that the application provides according to the addressing scheme of the image which the driver knows in the function "ID3D12Resource::WriteToSubresource". — Hanetaka Chou, Jan 17 '19 at 04:53
“near zero-copy” just means “zero-copy”/"no-copy" for most implementations. We use "near" here because it may still need "one-copy" for few implementations. — Hanetaka Chou, Jan 17 '19 at 04:58
Vulkan force me to make the buffer allocation. So I must copy the data to the buffer and the copy the buffer to the image. I copy one more time than D3D? — Hanetaka Chou, Jan 17 '19 at 05:01
The Direct3D12 Document uses the nomenclature "near zero-copy". The nomenclature "near zero-copy" just means “zero-copy”/"no-copy". They use "near" because it may still need some copies for few implementations. — Hanetaka Chou, Jan 17 '19 at 05:08
ID3D12Resource::WriteToSubresource doc: "Uses the CPU to **copy** data into a subresource". Zero-copy would be a move semantics, i.e. if `*pSrcData` pointer is swapped, but that is clearly **not** what the function does. — krOoze, Jan 17 '19 at 12:12
In Vulkan you do not have to "copy the data to the buffer", assuming the data is already in the buffer. Same as you not having to copy the data to your `*pSrcData` buffer first. Only difference is (unextended) Vulkan forces you to allocate (and map) that buffer through Vulkan, and does not allow you to just `malloc\new` it (or use a pointer returned by some other API created that way). Which may mean you prefer to copy anyway for practical reasons, but that is not mandatory. — krOoze, Jan 17 '19 at 12:15
But In Vulkan You Can Not Write By CPU. You Have To Submit A Copy Command To GPU And Increase Latency. — Hanetaka Chou, Jan 19 '19 at 16:15

Nicol Bolas · Answer 1 · 2019-01-15T14:15:47.860

What WriteToSubresource seems to do (in Vulkan-equivalent terms) is write pixel data from CPU memory to an image whose storage is in CPU-writable memory (hence the requirement that it first be mapped), to do so immediately without the need for a command buffer, and to be able to do so regardless of linear/tiling.

Vulkan doesn't have a way to do that. You can write directly to the backing storage for linear images (in the generic layout), but not for tiled ones. You have to use a proper transfer command for that, even on UMA architectures. Which means building a command buffer and submitting to a transfer-capable queue, since Vulkan doesn't have any immediate copy commands like that.

A Vulkan way to do this would essentially be a function that writes data to a mapped pointer to device memory storage as appropriate for a tiled VkImage in the pre-initialized layout that you intend to store in a particular region of memory. That way, you could then bind the image to that location of memory, and you'd be able to transition the layout to whatever you want.

But that would require adding such a function and allowing the pre-initialized layout to be used for tiled images (so long as the data is written by this function).

Since Vulkan is designed for mobile device，I think vulkan should add a function to allow CPU writing to swizzled images directly？ — Hanetaka Chou, Jan 15 '19 at 07:05

krOoze · Answer 2 · 2019-01-15T14:51:02.243

0

So, from ID3D12Resource::WriteToSubresource docunentation I read it performs one copy, with marketeze sprinkled on top.

Vulkan is an explicit API, which does perfectly allow you to do an one-copy on UMA (or on anything else). It even allows you to do real zero-copy, if you stick with linear tiling.

UMA may look like this: https://vulkan.gpuinfo.org/displayreport.php?id=4919#memorytypes I.e. has only one heap, and the memory type is both DEVICE_LOCAL and HOST_VISIBLE.

So, if you create a linearly tiled image\buffer in Vulkan, vkMapMemory its memory, and then produce your data into that mapped pointer directly, there you have a (real) zero-copy.

Since this is not always practical (i.e. you cannot always choose how things are allocated, e.g. if it is data returned from library function), there is an extension VK_EXT_external_memory_host (assuming your ICD supports it of course), which allows you to import your host data directly, without having to first make a Vulkan memory map.

Now, there are optimally tiled images. Optimal tiling is opaque in Vulkan (so far), and implementation-dependent, so you do not even know the addressing scheme without some reverse engineering. You, generally speaking, want to use optimally tiled images, because supposedly accessing them has better performance characteristics (at least in common situations).

This is where the single copy comes in. You would take your linearly tiled image (or buffer), and vkCmdCopy* it into your optimally tiled image. That copy is performed by the Device\GPU with all its bells and whistles, potentially faster than CPU, i.e. what I suspect they would call "near zero-copy".

edited Jan 15 '19 at 14:51

answered Jan 15 '19 at 14:30

krOoze

12,301
1
20
34

Even if Optimal tiling is opaque, the driver know the addressing scheme. – Hanetaka Chou Jan 16 '19 at 08:22
So the Driver can provide a function like "ID3D12Resource::WriteToSubresource" to allow CPU writing to the image directly. The Driver can take the data which you pass and write to the image according to the addressing scheme. – Hanetaka Chou Jan 16 '19 at 08:23
The Driver can take the data which you pass and write to the image according to the addressing scheme. – Hanetaka Chou Jan 16 '19 at 08:24
Since you use the vkCmdCopy*, it is not "near zero-copy". – Hanetaka Chou Jan 16 '19 at 08:27
I am still not clear what a "near zero-copy" is. Nearest integer to zero is one. – krOoze Jan 16 '19 at 16:24
By default (unextended) Vulkan wants to be in control of the allocation of the source buffer. So if you yield to this restriction and construct your buffer in it, then it is also one (accelerated) copy as the DX version. – krOoze Jan 16 '19 at 16:28
With the `VK_EXT_external_memory_host` the features should be on par with the DX, except this is more verbose and low-level. I.e. I should be able to implement a `ID3D12Resource::WriteToSubresource`-like high-level function using this. – krOoze Jan 16 '19 at 16:30
ad "near zero-copy". My answer works under the assumption the author counts CPU-side `memcpy` as one copy. And GPU-side block transfer\blit\DMA (incl. tiling translation) as "near zero". With that in mind, I then dare to count `vkCmdCopy*` as "near zero" too. At least that is my reading of the `ID3D12Resource::WriteToSubresource` doc -- it is not a spec though, so I cannot entirely be sure what they meant by their nomenclature. – krOoze Jan 16 '19 at 16:57
In Vulkan: CPU Write To VkBuffer. A CopyCommand Will Copy From VkBuffer To VkImage. – Hanetaka Chou Jan 17 '19 at 04:39
In D3D: CPU Write To Image Directly. Even if the application does not understand the addressing schme of the image, the driver knows. The application passed the pointer of the data to the function "ID3D12Resource::WriteToSubresource" and the driver writes to the image directly using the data which the application provides in the function "ID3D12Resource::WriteToSubresource". – Hanetaka Chou Jan 17 '19 at 04:43
Even if you use the VK_EXT_external_memory_host, you still can not write to the image directly without the addressing schme of the image known. The driver must provides a function like the "ID3D12Resource::WriteToSubresource" which takes the data that the application provdes and writes to the image direcly according to the addressing schme that the driver knows. – Hanetaka Chou Jan 17 '19 at 04:47
“near zero-copy” just means “zero-copy”/"no-copy" for most implementations. We use "near" here because it may still need "one-copy" for few implementations. – Hanetaka Chou Jan 17 '19 at 05:03
The Direct3D12 Document uses the nomenclature "near zero-copy". The nomenclature "near zero-copy" just means “zero-copy”/"no-copy". They use "near" because it may still need some copies for few implementations. – Hanetaka Chou Jan 17 '19 at 05:07
As commented in Q: it cannot be zero-copy, as the `ID3D12Resource::WriteToSubresource` doc repeatedly implies there is at least one copy. "CPU write to `VkBuffer`" is not mandatory, and you are doing it for practical reasons (i.e. you can already have your data in the `VkBuffer`, if you **constructed**\initialized the data there to avoid this copy). `VK_EXT_external_memory_host` removes the restriction (and this practical concern) and lets you use your own pointer no matter how you got it. – krOoze Jan 17 '19 at 12:27
As long as I map the VKBuffer, I can initialize my data in the VkBuffer. It has nothing to do with the VK_EXT_external_memory_host. – Hanetaka Chou Jan 19 '19 at 16:15
In D3D, you can use CPU to copy. – Hanetaka Chou Jan 19 '19 at 16:16
But in Vulkan You Must Submit A Copy Command To GPU. It Increase Latency. – Hanetaka Chou Jan 19 '19 at 16:17
With `VK_EXT_external_memory_host` you can skip mapping the buffer, and instead bind arbitrary pointer as a backing memory for the buffer. Rest is just preassumptions. You cannot be sure from the DX doc how exactly it is done and what kind of things it does behind your back. And you cannot be sure talking to GPU async copy engines through Vulkan increases latency. And CPU-controlled copy does not exactly sound like a win (more like waste of cycles)... It would be interesting to measure though... – krOoze Jan 19 '19 at 18:32
1

This is simply not right. Vulkan requires double copy there (CPU -> CPU, CPU -> GPU), and DX *also* allows this approach (copy to intermediate buffer/row-major texture, then call `CopyTextureRegion`). It is not "near-zero" copy at all. The `WriteToSubresource` method is useful on UMA architectures where all GPU memory is CPU visible, so there is no *need* for the GPU to perform the copy, and the call also allows you to write to opaque textures – John Nov 28 '20 at 13:19
@John I am not sure what you refer to that is not right. `WriteToSubresource` is to my understanding regular one copy, except MS brags that they can call optimized `memcpy`. So that would simply be just equivalent to CPU->HOST copy. And I have documented in the answer how you can get real zero copy in Vulkan; i.e. how to build stuff directly in the memory of the UMA GPU, although with the caveat that you would have inoptimal linear tiling if it is a `VkImage`. – krOoze Nov 29 '20 at 07:58
But lienar tiling has so many restrictions. Its not equivalent at all. In this case DX12 allows you to create a texture with less overhead than Vulkan. Not only that, Vulkan requires you to schedule a copy on GPU, while in DX12 everything is done on the CPU. – rozina Dec 03 '20 at 08:58
@rozina Tiled and non-tiled are by definition different data. Therefore it always requires at least one copy between them. It is true it is nominally scheduled in Vulkan to GPU. But if it is the UMA, it might literally be the same DMA engine. – krOoze Dec 03 '20 at 11:24

How to write to the image directly by CPU when load it in Vulkan?

2 Answers2