2

In a past question DirectX12 Upload Synchronization D3D12_HEAP_TYPE_UPLOAD, I got into trouble unmaping an upload resource, using it in a command list and executing, then mapping again and overwritting before the gpu had used the previous data

I must have thought mapping the second time would give me different memory to write into, if the gpu hadn't finished using the unmapped data.

So if this is not the case, then what is the point of unmapping in directX12?

Chuck Walbourn said

Take data from the CPU and copy it into the 'intermediate' resource (unmapping it when complete since there's no need to keep the virtual memory address assignment around).

I guess I don't even know whether virtual memory is in cpu or gpu memory, (maybe it's not in standard cpu or gpu memory; it's in some special memory on the gpu, or maybe it's device dependant hence the vagueness of what virtual memory is).

Tom Huntington
  • 2,260
  • 10
  • 20

1 Answers1

2

First, I think it's worth addressing the remapping issue.

In DX11, the driver does all the heavy lifting, so when you map (write/discard) a resource the driver's doing a bunch of work under the hood, specifically allocating a new buffer and returning you the address (referred to as "resource renaming"). The driver will track when the GPU is done with a particular bit of memory, and will manage when unused memory can be re-used.

For modern APIs (both DX12 and Vulkan) when you create a resource, the resource is explicitly bound to a location in memory. It's a much thinner layer (you're closer to the metal). When you map, you get a pointer. You can keep the resource mapped, forever, and that pointer returned will always be valid, and will always point to the address in memory the GPU will read from. The advantage here is that, since your application knows how it'll be using these resources, you can optimize for your specific use case. For example, if you have a constant buffer for view-dependent data that updates once a frame, and you're buffering 3 frames, you can just create 3 resources, map them all, and just round-robin through them - saving the overhead of API calls etc.

On the virtual memory front, when you map, you get a pointer that's a virtual memory address, that'll be mapped to somewhere in physical, CPU-side memory. So the mapping is definitely to physical CPU memory. How that memory is mapped to the GPU is probably device/system dependent, but I believe in most cases the memory lives on the CPU-side memory and is read by the GPU via the PCIe bus (which is why you upload rather than just let the GPU read from that resource directly).

Given that most apps these days are built for 64-bit architectures, we're generally not super limited with virtual memory address space, but it's still not a bad idea to clean up if you're not going to be using it, since it's still using up resources (page tables for virtual memory mapping, etc).

Varrak
  • 708
  • 3
  • 13
  • 1
    > *For example, if you have a constant buffer for view-dependent data that updates once a frame, and you're buffering 3 frames, you can just create 3 resources, map them all, and just round-robin through them - saving the overhead of API calls etc.* --- Good to know – Tom Huntington Jul 06 '22 at 18:16
  • 1
    So 64-bit architectures have more virtual memory than 32, I think I get it. Also this was helpful https://en.wikipedia.org/wiki/Page_table – Tom Huntington Jul 06 '22 at 18:19
  • 1
    There's some great books that go deep on CPU architecture and how page tables etc work. It's easy to fall into a deep, deep hole on these subjects ;) – Varrak Jul 06 '22 at 18:31
  • 2
    Also, it's worth noting that you can definitely get into some trouble with the map/reuse pattern I mentioned above. You need to be *absolutely sure* that the GPU is done with the resource in question before you update it. I've had bugs before where I've overwritten some constant buffer before (or while) the GPU was consuming it, and they can be really gnarly to track down. – Varrak Jul 06 '22 at 18:44
  • Yes, I'm unsure when you have n frames whether you should/need to create n+1 resources or not. – Tom Huntington Jul 06 '22 at 18:47
  • I hit exactly this issue recently. I ended up having to create n+1, because the CPU was well ahead in this case, and in some cases would be overwriting constant buffers while they were still in use (and n+1 because I was gating CPU work waiting on the swapchain, which meant the CPU was effectively an extra frame ahead). It's definitely tricky to get this absolutely correct, but the trade-off is, you can end up with a code path that's way more efficient. – Varrak Jul 06 '22 at 20:15
  • Oh so the cpu wait on allows you to have n resources, for other situations you will need more – Tom Huntington Jul 06 '22 at 21:42