Windows C++ fast RGBA32 DX texture to RGB24 buffer

Question

If already have a DirectX texture in hand, what is the fast(low CPU utilization) way to get a RGB24 buffer in main RAM from it (skip the A)?

Strictly the format is DXGI_FORMAT_B8G8R8A8_UNORM, does this mean ARGB?

I'm using https://github.com/bmharper/WindowsDesktopDuplicationSample to capture Windows desktop and the result is to be converted into RGB24 lossless format in main RAM.

The existing code copy GPU texture to CPU texture then use memcpy to copy each line of the RGBA32 data to main RAM:

ID3D11Texture2D* gpuTex = nullptr;
hr                      = deskRes->QueryInterface(__uuidof(ID3D11Texture2D), (void**) &gpuTex);
....
ID3D11Texture2D* cpuTex = nullptr;
hr                      = D3DDevice->CreateTexture2D(&desc, nullptr, &cpuTex);
....
D3DDeviceContext->CopyResource(cpuTex, gpuTex);
....
D3D11_MAPPED_SUBRESOURCE sr;
hr = D3DDeviceContext->Map(cpuTex, 0, D3D11_MAP_READ, 0, &sr);
....    
for (int y = 0; y < (int) desc.Height; y++)
    memcpy(Latest.Buf.data() + y * desc.Width * 4, (uint8_t*) sr.pData + sr.RowPitch * y, desc.Width * 4);

Some possible way:

(1) Is it possible to let GPU convert the texture to RGB24 texture before copy it to CPU texture? How to do this in DX?

(2) If not, some assembly code to replace the line memcpy

memcpy(Latest.Buf.data() + y * desc.Width * 4, (uint8_t*) sr.pData + sr.RowPitch * y, desc.Width * 4);

to do RGBA32 to RGB24 conversion for a line may be the best way, can some example or reference be provided?

What do you plan to do with the resulting RGB24 image? It might be more efficient to let the output step process RGBA and throw away the alpha channel. — Botje, Apr 17 '20 at 11:20
@Botje The result data is not to be displayed. If D3DDeviceContext->CopyResource can copy 32 to 24 is best, but there is no 24bit formats at all. — jw_, Apr 17 '20 at 12:13
That may be, but what are you going to _do_ with the output? Are you converting to 24bit RGB to save some memory or are you running some code that only works with 24bit RGB? Otherwise, keeping it in 32bit RGBA will be easier to work with (indexing and SIMD code, for example) — Botje, Apr 17 '20 at 12:18
@Botje It is to be transfered with low latency and high fidelity, this is why A need to be throw away and YUV/RGB565 is not used. Recommendation about vector instruction that converts multiple 32bit to 24bit and pack the 24bits together is prefered. — jw_, Apr 17 '20 at 12:21
The first hit on "AVX RGBA RGB" returns a [PDF from Intel](https://software.intel.com/sites/default/files/c7/e0/21089) with a technique. — Botje, Apr 20 '20 at 07:09
@Botje Nice, though not sure whether such simple operation can benifit from AVX. It is memory limited. Anyway it is an official out of the box code. — jw_, Apr 20 '20 at 07:58
The last page mentions a 2x speedup. And you can distribute the task to multiple cores, too. — Botje, Apr 20 '20 at 08:07

Windows C++ fast RGBA32 DX texture to RGB24 buffer

0 Answers0

Linked