1

I'm setting up a VideoRenderer for my application which uses Direct3D9Ex interfaces but when i use big texture (desktop resolution) the video starts to slow down.

I was using DirectShow but i found some problems with H264 and i decided to go for Media Foundation. I've searched a lot about it, but i did not get how to render a video with DXVA, and because of that, im reading a sample with IMFSourceReader (Async) using the MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING and MFVideoFormat_RGB32 so i can copy to my surface and then render it normal.

This is how i create the SourceReader.

    MFCreateAttributes(&m_Attributes, 4);

    m_Attributes->SetUnknown(MF_SOURCE_READER_D3D_MANAGER, GRAPHICSDEVICE->GetDeviceManager());
    m_Attributes->SetUnknown(MF_SOURCE_READER_ASYNC_CALLBACK, this);
    m_Attributes->SetUINT32(MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS, TRUE);
    m_Attributes->SetUINT32(MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING, TRUE);

    MFCreateSourceReaderFromURL(L"Video.mp4", m_Attributes, &m_SourceReader);
    MFCreateMediaType(&m_MediaType);
    MFSetAttributeSize(m_MediaType, MF_MT_FRAME_SIZE, m_VideoWidth, m_VideoHeight);

    m_MediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
    m_MediaType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32);

Then i post one ReadSample and in my Update method, i do that:

if (WaitForSingleObject(m_SampleEvent, 0) == WAIT_OBJECT_0)
    {
        if (m_SourceReader)
        {
            m_SourceReader->ReadSample(MF_SOURCE_READER_FIRST_VIDEO_STREAM, 0, nullptr, nullptr, nullptr, nullptr);
        }
    }

This is a part of my OnReadSample callback, that just copies one surface to another.

IDirect3DSurface9 * pSampleSurface = nullptr;

if (SUCCEEDED(GetD3DSurfaceFromSample(Sample, &pSampleSurface)))
{
    D3DLOCKED_RECT SampleRect;
    if (FAILED(pSampleSurface->LockRect(&SampleRect, nullptr, D3DLOCK_READONLY)))
    {
        pSampleSurface->Release();
        goto Quit;
    }

    BYTE * pVideo = (BYTE*)SampleRect.pBits;

    D3DLOCKED_RECT TextureRect;
    if (FAILED(m_Texture->LockRect(0, &TextureRect, nullptr, D3DLOCK_DISCARD)))
    {
        pSampleSurface->UnlockRect();
        pSampleSurface->Release();
        goto Quit;
    }

    BYTE * pDest = (BYTE*)TextureRect.pBits;

    for (unsigned int i = 0; i < m_VideoHeight; i++)
    {
        CopyMemory(pDest, pVideo, m_VideoWidth * 4);
        pDest += TextureRect.Pitch;
        pVideo += SampleRect.Pitch;
    }

    m_Texture->UnlockRect(0);
    pSampleSurface->UnlockRect();
    pSampleSurface->Release();
}

So, my actual results are acceptable for a debug environment, but when i change my application resolution to my desktop one (from 800x600 to 1366x768) things starts to get a lot slower.

Do i have to use something as DXVA? Can i tweak the current code to run faster? Where can i find some good samples about it?

Roman R.
  • 68,205
  • 6
  • 94
  • 158

1 Answers1

1

The main speed related factor here is to be able to decode on GPU into texture and then use this texture without donwloading the data into system memory, if possible.

You are doing MF_SOURCE_READER_D3D_MANAGER and eventually you read data from texture. So DXVA is already working for you, and it should work out decently fast (that is, you don't need to accelearate ReadSample per se). IDirect3DSurface9::LockRect and accessing bits is presumably making is slow, you might want to disable reading texture step and compare the performance to verify.

Roman R.
  • 68,205
  • 6
  • 94
  • 158
  • How so? I mean, dont i have to read the texture and then put it on another one to present? – Cainan Kenji Kita Dec 24 '18 at 16:13
  • It's quite typical, e.g. ["I profiled and found that copying memory from GPU to CPU is very expensive. I am looking for your inputs to alleviate this performance loss."](https://software.intel.com/en-us/forums/intel-media-sdk/topic/609587) - you will find many of these. – Roman R. Dec 24 '18 at 16:37
  • I'm sorry but i did not find the connection between that and my problem. Apparently he has some issues with FPS as well but he is using a different media api. Should i use IOPattern to ReadSample? – Cainan Kenji Kita Dec 24 '18 at 17:09
  • The connection is that I suggest you stop doing a read from video memory (from texture) and compare performance of your code. I would expect that you have it much faster and this would prove that `ReadSample` is okay and your reading from texture into system memory is slow. The link from my previous comment is actually showing that it's a fundamental challenge, applicable to all APIs. You will find much less similar discussions on Media Foundation just because it is a not a popular API, but if you look at DirectShow, Direct3D, vendor SDKs you will see everyone is dealing with this. – Roman R. Dec 24 '18 at 17:20
  • Got it, but with that in mind, how should i render it to my device? That was the only way i found to do it. – Cainan Kenji Kita Dec 24 '18 at 17:38
  • You can render it as a texture in context of Direct3D enabled application. Then, DirectShow/Media Foundation [Enhanced Video Renderer (EVR)](https://docs.microsoft.com/en-us/windows/desktop/medfound/enhanced-video-renderer) is capable of rendering media samples with texture buffers. And then if you prefer D3D11 to D3D9, which makes sense overall but you might have your own reasons to stay with ver 9, then [DX11VideoRenderer](https://github.com/Microsoft/Windows-classic-samples/tree/master/Samples/DX11VideoRenderer) is the Media Foundation renderer to render frames with D3D11 texture samples. – Roman R. Dec 24 '18 at 17:43
  • Sorry for me being a newbie at this subject, but i guess im starting to figure out. I did not understand the first line of your answer, but i read about EVR and it seems like they create a sample using a SwapChain so that the user doesn't have to copy the texture data into system memory, but instead, present to backbuffer. Am i right? – Cainan Kenji Kita Dec 24 '18 at 19:07
  • The first sentence of my previous comment (is this the one you have hard time comprehending, or the one from actual answer above?) means the following: you can render video frames without further use of DirectShow or Media Foundation APIs. You have video frames as textures and you can extract them from Media Foundation `IMFSample` objects for use in Direct3D rendering, without pulling actual data from the textures (hence, no GPU to CPU transfers). That is, it's the use scenario where you don't need or want EVR and DX11VideoRenderer at all. – Roman R. Dec 24 '18 at 19:21
  • "You have video frames as textures and you can extract them from Media Foundation IMFSample objects for use in Direct3D rendering, without pulling actual data from the textures (hence, no GPU to CPU transfers)". How? Samples? Links? – Cainan Kenji Kita Dec 26 '18 at 05:09
  • Your code snippet suggests that you can extract `IDirect3DSurface9` from samples (`GetD3DSurfaceFromSample` in code). If you look for Direct3D 9 demos you would find that rendering applications deal with `IDirect3DSurface9` for their needs, you could use surfaces you read from media files. That is, without doing `LockRect` and friends. If you use Direct3D 11 (which I believe is the recommended over Direct3D 9) there is a similar method of extracting `ID3D11Texture2D` from samples and then the texture could be used in D3D 11 presentation (D3D11 demos or integration with Unity via DX11 etc.). – Roman R. Dec 26 '18 at 06:26
  • Are you sure that they use IDirect3DSurface9? Because I've looked at some and they use IDirect3DSurface9 from a IDirect3DTexture9, so, just a single SetTexture and done. I'm using Direct3D 9 because I already have a lot of stuff done, but I plan to upgrade to Direct3D 11. – Cainan Kenji Kita Dec 26 '18 at 19:56