Windows Media Foundation MFT buffering and video quality issues (Loss of colors, not so smooth curves, especially text)

Question

I'm trying to encode RGBA buffers captured from an image (RGBA) source (Desktop/Camera) into raw H264 using Windows Media Foundation, transfer them and decode the raw H264 frames received at the other end in real time. I'm trying to achieve at least 30 fps. The encoder works pretty good but not the decoder.

I understand Microsoft WMF MFTs buffer up to 30 frames before emitting the encoded/decoded data.

The image source would emit frames only when there is a change occurs and not a continuous stream of RGBA buffers, so my intention is to obtain a buffer of encoded/decoded data for each and every input buffer to the respective MFT so that I can stream the data in real time and also render it.

Both the encoder and decoder are able to emit at least 10 to 15 fps when I make the image source to send continuous changes (by stimulating the changes). The encoder is able to utilize hardware acceleration support. I'm able to achieve up to 30 fps in the encoder end, and I'm yet to implement hardware assisted decoding using DirectX surfaces. Here the problem is not the frame rate but buffering of data by MFTs.

So, I tried to drain the decoder MFT by sending the MFT_MESSAGE_COMMAND_DRAIN command, and repeatedly calling ProcessOutput until the decoder returns MF_E_TRANSFORM_NEED_MORE_INPUT. What happens now is the decoder now emits only one frame per 30 input h264 buffers, I tested it with even a continuous stream of data, and the behavior is same. Looks like the decoder drops all the intermediate frames in a GOP.

It's okay for me if it buffers only first few frames, but my decoder implementation outputs only when it's buffer is full all the time even after the SPS and PPS parsing phase.

I come across Google's chromium source code (https://github.com/adobe/chromium/blob/master/content/common/gpu/media/dxva_video_decode_accelerator.cc), they follow the same approach.

mpDecoder->ProcessMessage(MFT_MESSAGE_COMMAND_DRAIN, NULL);

My implementation is based on https://github.com/GameTechDev/ChatHeads/blob/master/VideoStreaming/EncodeTransform.cpp

and

https://github.com/GameTechDev/ChatHeads/blob/master/VideoStreaming/DecodeTransform.cpp

My questions are, Am I missing something? Is Windows Media Foundation suitable for real-time streaming?. Whether draining the encoder and decoder would work for real-time use cases?.

There are only two options for me, make this WMF work of real-time use case or go with something like Intel's QuickSync. I chose WMF for my POC because Windows Media Foundation implicitly supports Hardware/GPU/Software fallbacks in case any of the MFT is unavailable and it internally chooses best available MFT without much coding.

I'm facing video quality issues, though the bitrate property is set to 3Mbps. But it is least in priority compared to buffering problems. I have been beating my head around the keyboard for weeks, This is so hard to fix. Any help would be appreciated.

Code:

Encoder setup:

IMFAttributes* attributes = 0;
    HRESULT  hr = MFCreateAttributes(&attributes, 0);

    if (attributes)
    {
        //attributes->SetUINT32(MF_SINK_WRITER_DISABLE_THROTTLING, TRUE);
        attributes->SetGUID(MF_TRANSCODE_CONTAINERTYPE, MFTranscodeContainerType_MPEG4);
    }//end if (attributes)

    hr = MFCreateMediaType(&pMediaTypeOut);
    // Set the output media type.
    if (SUCCEEDED(hr))
    {
        hr = MFCreateMediaType(&pMediaTypeOut);
    }
    if (SUCCEEDED(hr))
    {
        hr = pMediaTypeOut->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
    }
    if (SUCCEEDED(hr))
    {
        hr = pMediaTypeOut->SetGUID(MF_MT_SUBTYPE, cVideoEncodingFormat); // MFVideoFormat_H264
    }
    if (SUCCEEDED(hr))
    {
        hr = pMediaTypeOut->SetUINT32(MF_MT_AVG_BITRATE, VIDEO_BIT_RATE); //18000000
    }
    if (SUCCEEDED(hr))
    {
        hr = MFSetAttributeRatio(pMediaTypeOut, MF_MT_FRAME_RATE, VIDEO_FPS, 1); // 30
    }
    if (SUCCEEDED(hr))
    {
        hr = MFSetAttributeSize(pMediaTypeOut, MF_MT_FRAME_SIZE, mStreamWidth, mStreamHeight);
    }
    if (SUCCEEDED(hr))
    {
        hr = pMediaTypeOut->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive);
    }
    if (SUCCEEDED(hr))
    {
        hr = pMediaTypeOut->SetUINT32(MF_MT_MPEG2_PROFILE, eAVEncH264VProfile_High);
    }

    if (SUCCEEDED(hr))
    {
        hr = MFSetAttributeRatio(pMediaTypeOut, MF_MT_PIXEL_ASPECT_RATIO, 1, 1);
    }

    if (SUCCEEDED(hr))
    {
        hr = pMediaTypeOut->SetUINT32(MF_MT_MAX_KEYFRAME_SPACING, 16);
    }
    if (SUCCEEDED(hr))
    {
        hr = pMediaTypeOut->SetUINT32(CODECAPI_AVEncCommonRateControlMode, eAVEncCommonRateControlMode_UnconstrainedVBR);//eAVEncCommonRateControlMode_Quality, eAVEncCommonRateControlMode_UnconstrainedCBR);
    }
    if (SUCCEEDED(hr))
    {
        hr = pMediaTypeOut->SetUINT32(CODECAPI_AVEncCommonQuality, 100);
    }
    if (SUCCEEDED(hr))
    {
        hr = pMediaTypeOut->SetUINT32(MF_MT_FIXED_SIZE_SAMPLES, FALSE);
    }
    if (SUCCEEDED(hr))
    {
        BOOL allSamplesIndependent = TRUE;
        hr = pMediaTypeOut->SetUINT32(MF_MT_ALL_SAMPLES_INDEPENDENT, allSamplesIndependent);
    }
    if (SUCCEEDED(hr))
    {
        hr = pMediaTypeOut->SetUINT32(MF_MT_COMPRESSED, TRUE);
    }

    if (SUCCEEDED(hr))
    {
        hr = mpEncoder->SetOutputType(0, pMediaTypeOut, 0);
    }

// Process the incoming sample. Ignore the timestamp & duration parameters, we just render the data in real-time.

HRESULT ProcessSample(IMFSample **ppSample, LONGLONG& time, LONGLONG& duration, TransformOutput& oDtn)
{
    IMFMediaBuffer *buffer = nullptr;
    DWORD bufferSize;
    HRESULT hr = S_FALSE;

    if (ppSample)
    {
        hr = (*ppSample)->ConvertToContiguousBuffer(&buffer);

        if (SUCCEEDED(hr))
        {
            buffer->GetCurrentLength(&bufferSize);

            hr = ProcessInput(ppSample);

            if (SUCCEEDED(hr))
            {
                //hr = mpDecoder->ProcessMessage(MFT_MESSAGE_COMMAND_DRAIN, NULL);

                //if (SUCCEEDED(hr)) 
                {
                    while (hr != MF_E_TRANSFORM_NEED_MORE_INPUT)
                    {
                        hr = ProcessOutput(time, duration, oDtn);
                    }
                }
            }
            else
            {
                if (hr == MF_E_NOTACCEPTING)
                {
                    while (hr != MF_E_TRANSFORM_NEED_MORE_INPUT)
                    {
                        hr = ProcessOutput(time, duration, oDtn);
                    }

                }
            }
        }

    }

    return (hr == MF_E_TRANSFORM_NEED_MORE_INPUT ? (oDtn.numBytes > 0 ? oDtn.returnCode : hr) : hr);
}

// Finds and returns the h264 MFT (given in subtype parameter) if available...otherwise fails.

HRESULT FindDecoder(const GUID& subtype)
{
    HRESULT hr = S_OK;
    UINT32 count = 0;

    IMFActivate  **ppActivate = NULL;

    MFT_REGISTER_TYPE_INFO info = { 0 };

    UINT32 unFlags = MFT_ENUM_FLAG_HARDWARE | MFT_ENUM_FLAG_ASYNCMFT;

    info.guidMajorType = MFMediaType_Video;
    info.guidSubtype = subtype;

    hr = MFTEnumEx(
        MFT_CATEGORY_VIDEO_DECODER,
        unFlags,
        &info,
        NULL,
        &ppActivate,
        &count
    );

    if (SUCCEEDED(hr) && count == 0)
    {
        hr = MF_E_TOPO_CODEC_NOT_FOUND;
    }

    if (SUCCEEDED(hr))
    {
        hr = ppActivate[0]->ActivateObject(IID_PPV_ARGS(&mpDecoder));
    }

    CoTaskMemFree(ppActivate);
    return hr;
}

// reconstructs the sample from encoded data

HRESULT ProcessData(char *ph264Buffer, DWORD bufferLength, LONGLONG& time, LONGLONG& duration, TransformOutput &dtn)
{
    dtn.numBytes = 0;
    dtn.pData = NULL;
    dtn.returnCode = S_FALSE;

    IMFSample *pSample = NULL;
    IMFMediaBuffer *pMBuffer = NULL;

    // Create a new memory buffer.
    HRESULT hr = MFCreateMemoryBuffer(bufferLength, &pMBuffer);

    // Lock the buffer and copy the video frame to the buffer.
    BYTE *pData = NULL;
    if (SUCCEEDED(hr))
        hr = pMBuffer->Lock(&pData, NULL, NULL);

    if (SUCCEEDED(hr))
        memcpy(pData, ph264Buffer, bufferLength);

    pMBuffer->SetCurrentLength(bufferLength);
    pMBuffer->Unlock();

    // Create a media sample and add the buffer to the sample.
    if (SUCCEEDED(hr))
        hr = MFCreateSample(&pSample);

    if (SUCCEEDED(hr))
        hr = pSample->AddBuffer(pMBuffer);

    LONGLONG sampleTime = time - mStartTime;

    // Set the time stamp and the duration.
    if (SUCCEEDED(hr))
        hr = pSample->SetSampleTime(sampleTime);

    if (SUCCEEDED(hr))
        hr = pSample->SetSampleDuration(duration);

    hr = ProcessSample(&pSample, sampleTime, duration, dtn);

    ::Release(&pSample);
    ::Release(&pMBuffer);

    return hr;
}

// Process the output sample for the decoder

HRESULT ProcessOutput(LONGLONG& time, LONGLONG& duration, TransformOutput& oDtn/*output*/)
{
    IMFMediaBuffer *pBuffer = NULL;
    DWORD mftOutFlags;
    MFT_OUTPUT_DATA_BUFFER outputDataBuffer;
    IMFSample *pMftOutSample = NULL;
    MFT_OUTPUT_STREAM_INFO streamInfo;

    memset(&outputDataBuffer, 0, sizeof outputDataBuffer);

    HRESULT hr = mpDecoder->GetOutputStatus(&mftOutFlags);
    if (SUCCEEDED(hr))
    {
        hr = mpDecoder->GetOutputStreamInfo(0, &streamInfo);
    }

    
    if (SUCCEEDED(hr))
    {
        hr = MFCreateSample(&pMftOutSample);
    }

    
    if (SUCCEEDED(hr))
    {
        hr = MFCreateMemoryBuffer(streamInfo.cbSize, &pBuffer);
    }

    
    if (SUCCEEDED(hr))
    {   
        hr = pMftOutSample->AddBuffer(pBuffer);
    }
    
    if (SUCCEEDED(hr))
    {
        DWORD dwStatus = 0;

        outputDataBuffer.dwStreamID = 0;
        outputDataBuffer.dwStatus = 0;
        outputDataBuffer.pEvents = NULL;
        outputDataBuffer.pSample = pMftOutSample;

        hr = mpDecoder->ProcessOutput(0, 1, &outputDataBuffer, &dwStatus);
    }

    if (SUCCEEDED(hr))
    {
        hr = GetDecodedBuffer(outputDataBuffer.pSample, outputDataBuffer, time, duration, oDtn);
    }

    if (pBuffer)
    {
        ::Release(&pBuffer);
    }

    if (pMftOutSample)
    {
        ::Release(&pMftOutSample);
    }

    return hr;
}

// Write the decoded sample out

HRESULT GetDecodedBuffer(IMFSample *pMftOutSample, MFT_OUTPUT_DATA_BUFFER& outputDataBuffer, LONGLONG& time, LONGLONG& duration, TransformOutput& oDtn/*output*/)
{
    // ToDo: These two lines are not right. Need to work out where to get timestamp and duration from the H264 decoder MFT.
    HRESULT hr = outputDataBuffer.pSample->SetSampleTime(time);
    
    if (SUCCEEDED(hr))
    {
        hr = outputDataBuffer.pSample->SetSampleDuration(duration);
    }
    

    if (SUCCEEDED(hr))
    {
        hr = pMftOutSample->ConvertToContiguousBuffer(&pDecodedBuffer);
    }
    
    if (SUCCEEDED(hr))
    {
        DWORD bufLength;
        hr = pDecodedBuffer->GetCurrentLength(&bufLength);
    }
    
    if (SUCCEEDED(hr))
    {
        byte *pEncodedYUVBuffer;
        DWORD buffCurrLen = 0;
        DWORD buffMaxLen = 0;
        pDecodedBuffer->GetCurrentLength(&buffCurrLen);
        pDecodedBuffer->Lock(&pEncodedYUVBuffer, &buffMaxLen, &buffCurrLen);
        ColorConversion::YUY2toRGBBuffer(pEncodedYUVBuffer, 
                                        buffCurrLen, 
                                        mpRGBABuffer,
                                        mStreamWidth,
                                        mStreamHeight,
                                        mbEncodeBackgroundPixels,
                                        mChannelThreshold);

        pDecodedBuffer->Unlock();       
        ::Release(&pDecodedBuffer);

        oDtn.pData = mpRGBABuffer;
        oDtn.numBytes = mStreamWidth * mStreamHeight * 4;
        oDtn.returnCode = hr; // will be S_OK..
    }
        
    return hr;
}

Update: The decoder's output is satisfactory now after enabling CODECAPI_AVLowLatency Mode, But with 2 seconds delay in the stream compared to the sender, I'm able to achieve 15 to 20fps that's a lot better compared to previous. The quality detoriates when there are more number of changes pushed from the source to the encoder. I'm yet to implement hardware accelerated decoding.

Update2: I figured out that the timestamp and duration settings are the ones that affect the quality of the video if set improperly. The thing is, my image source does not emit frames at a constant rate, but it looks like the encoder and decoders expect constant frame rate. When I set the duration as constant and increment the sample time in constant steps the video quality seems to be better but not the greatest though. I don't think what I'm doing is the correct approach. Is there any way to specify the encoder and decoder about the variable frame rate?

Update3: I'm able to get acceptable performance from both encoders and decoders after setting CODECAPI_AVEncMPVDefaultBPictureCount (0), and CODECAPI_AVEncCommonLowLatency properties. Yet to explore hardware accelerated decoding. I hope I would be able to achieve the best performance if hardware decoding is implemented.

The quality of the video is still poor, edges & curves are not sharp. Text looks blurred, and it's not acceptable. The quality is okay for videos and images but not for texts and shapes.

Update4

It seems some of the color information is getting lost in the YUV subsampling phase. I tried converting the RGBA buffer to YUV2 and then back, The color loss is visible but not bad though. The loss due to YUV conversion is not as bad as the quality of the image that is being rendered after RGBA-> YUV2 -> H264 -> YUV2 -> RGBA conversion. It's evident that not just YUV2 conversion the sole reason for the loss of quality but also the H264 encoder that further causes aliasing. I would still have obtained a better video quality if the H264 encode doesn't introduce aliasing effects. I'm going to explore WMV CODECs. The only thing that still bothers me is this, the code works pretty well and is able to capture the screen and save the stream in mp4 format in a file. The only difference here is that I'm using Media foundation transform with MFVideoFormat_YUY2 input format compared to the sink writer approach with MFVideoFormat_RGB32 as input type in the mentioned code. I still have some hope that it is possible to achieve better quality through Media Foundation itself. The thing is MFTEnum/ProcessInput fails if I specify MFVideoFormat_ARGB32 as input format in MFT_REGISTER_TYPE_INFO (MFTEnum)/SetInputType respectively.

Original:

Decoded image (After RGBA -> YUV2 -> H264 -> YUV2 -> RGBA conversion):

Click to open in the new tab to view the full image so that you can see the aliasing effect.

I believe you should implement hardware decoding (using https://learn.microsoft.com/en-us/windows/desktop/api/mfobjects/nn-mfobjects-imfdxgidevicemanager) first and then measure the FPS. Note that decoder starts the actual decoding after it first encounters the SPS and PPS frames. You can also try CODECAPI_AVLowLatencyMode: https://learn.microsoft.com/en-us/windows/desktop/medfound/h-264-video-decoder — VuVirt, Mar 05 '19 at 08:49
I understand that decoder MFT starts decoding only when it encounters first SPS and PPS. It's okay for me if it buffers only first few frames, but my decoder implementation outputs only when it's buffer is full all the time even after the SPS and PPS parsing phase. I'm working on hardware-assisted decoding right now. — iamrameshkumar, Mar 05 '19 at 09:25
Btw, you shouldn't omit timestamps on decoder's input samples. This might be the reason for your problem. Try to figure them out from the elapsed time or if you know the H264 stream FPS in advance. Let me know how it works with accelerated decoder. — VuVirt, Mar 05 '19 at 09:38
Are you doing encoding/decoding synchronously ? I would do it asynchronously, as would a media session. — mofo77, Mar 06 '19 at 20:54
Yes, I'm doing encoding/decoding synchronously. I'm yet to explore asynchronous approaches. — iamrameshkumar, Mar 07 '19 at 05:48
@Ram you can set a (much) higher value for the video bitrate to achieve better quality — VuVirt, Mar 07 '19 at 18:29
@VuVirt I think I have found the root cause. I'm feeding RGBA samples to an RGBA to packed YUV2 convertor before H264 conversion and vice versa in the decoder which I blindly followed from https://github.com/GameTechDev/ChatHeads/blob/master/VideoStreaming/EncodeTransform.cpp. I'm suspecting the YUV2 conversion process. I'm yet to verify it. I strongly believe it's YUV conversion process that leads to aliasing effects as I mentioned. — iamrameshkumar, Mar 07 '19 at 22:45
I'm trying to explore other legit methods to convert my RGBA buffer to YUV so that I could pass the same to H264 transform. I'm exploring Windows Color Converter DSP and SIMD (SSE2/SSE3/Vector instruction set) approaches so that I could achieve the best frame rate. — iamrameshkumar, Mar 07 '19 at 22:56
With hardware deciding you’ll have a h264 buffer at the input and will receive an nv12 texture at the output. The nv12 texture can be converted to rgba texture using GPU by the means if dx 11 video processor or even using Video Processor MFT. — VuVirt, Mar 07 '19 at 23:11
Good. Do your RGBA->YUV optimization, and tell us if it is ok. — mofo77, Mar 07 '19 at 23:26
Is it possible to show an _Input_ image vs _Output_ image (H264 frame grab)? — VC.One, Mar 11 '19 at 00:15
@VC.One, I have updated the post with the input & output images. — iamrameshkumar, Mar 11 '19 at 08:31

Markus Schumann · Answer 1 · 2019-03-12T15:05:12.773

4

Most consumer H.264 encoders sub-sample the color information to 4:2:0. (RGB to YUV) This means before the encode process even starts your RGB bitmap losses 75% of the color information. H.264 was more designed for natural content rather than screen capture. But there are codecs that are specifically designed to achieve good compression for screen content. For example: https://learn.microsoft.com/en-us/windows/desktop/medfound/usingthewindowsmediavideo9screencodec Even if you increase the bitrate of your H.264 encode - you are working only with 25% of the original color information to start with.

So your format changes look like this:

You start with 1920x1080 red, green and blue pixels. You transform to YUV. Now you have 1920x1080 luma, Cb and Cr. where Cb and Cr are color difference components. This is just a different way of representing colors. Now you scale the Cb and Cr plane to 1/4 of their original size. So your resulting Cb and Cr channels are around 960x540 and your luma plane is still 1920x1080. By scaling your color information from 1920x1080 to 960x540 - you are down to 25% of the original size. Then the full size luma plane and 25% color difference channels are passed into the encoder. This level of reducing the color information is called subsampling to 4:2:0. The subsampled input is required by the encoder and is done automatically by the media framework. There is not much you can do to escape it - outside from choosing a different format.

R = red
G = green
B = blue

Y = luminescence
U = blue difference  (Cb)
V = red difference  (Cr)

YUV is used to separate out a luma signal (Y) that can be stored with high resolution or transmitted at high bandwidth, and two chroma components (U and V) that can be bandwidth-reduced, subsampled, compressed, or otherwise treated separately for improved system efficiency. (Wikipedia)

Original format

RGB (4:4:4) 3 bytes per pixel

R  R  R  R   R  R  R  R    R  R  R  R   R  R  R  R
G  G  G  G   G  G  G  G    G  G  G  G   G  G  G  G
B  B  B  B   B  B  B  B    B  B  B  B   B  B  B  B

Encoder input format - before H.264 compression

YUV (4:2:0) 1.5 bytes per pixel (6 bytes per 4 pixel)

Y  Y  Y  Y   Y  Y  Y  Y   Y  Y  Y  Y   Y  Y  Y  Y
    UV           UV           UV           UV

edited Mar 12 '19 at 15:05

answered Mar 08 '19 at 01:52

Markus Schumann

7,636
1
21
27

2

if RGB=0, do you loose 75% of the color information ? – mofo77 Mar 11 '19 at 01:34
1

I am sorry but I don't understand your question. Please checkout this article: https://connect.teradici.com/blog/the-importance-of-lossless-support – Markus Schumann Mar 11 '19 at 17:24
@MarkusSchumann, WMV9 codec is a piece of good information. I did some research on it but failed to figure out a useful & working sample. It would be more helpful if you could direct me to some sample code. – iamrameshkumar Mar 11 '19 at 17:45
Is WMV deprecated in favor of WMF or Expression encoders?, I couldn't find the WMV SDK in Microsoft's site. – iamrameshkumar Mar 11 '19 at 17:56
@MarkusSchumann, I mean if R=0, G=0, B=0, the "standard calculation" will give Y=0, U=0, V=0. So the reverse operation will give R=0, G=0, B=0. In this case, we loose 0 % of color information. So where does come from the 75% ? That's a question. – mofo77 Mar 11 '19 at 23:40
@mofo77 U and V are calculated from at least 4 for RGB values and this is a lossy operation. – Markus Schumann Mar 12 '19 at 15:10
@mofo77 : it looks like your screen is captured without antialiasing. https://learn.microsoft.com/en-us/windows/desktop/direct3d9/antialiasing I wonder if your graphics hardware does the antialiasing but the buffer you get is not antialiased. – Markus Schumann Mar 13 '19 at 11:08
For me antialiasing is for 3D rendering. But i will look if for capture, there is an impact – mofo77 Mar 13 '19 at 18:55
No impact : https://learn.microsoft.com/en-us/windows/desktop/api/d3d9/nf-d3d9-idirect3ddevice9-getfrontbufferdata : The buffer pointed to by pDestSurface will be filled with a representation of the front buffer, converted to the standard 32 bits per pixel format D3DFMT_A8R8G8B8. This method is the only way to capture an antialiased screen shot. I tested different multisample level, and i see no diffrence. – mofo77 Mar 13 '19 at 23:00
@mofo77 - Have you captured the screen and just stored the bitmap locally - uncompressed as BMP file? – Markus Schumann Mar 14 '19 at 00:02
For me the captured screen buffer is of high quality, There is no aliasing effect until the start of h264 encoder step. YUV conversion introduces little color loss but there are no distortions like aliasing. – iamrameshkumar Mar 14 '19 at 08:34
@MarkusSchumann, yes, this is D3DFMT_A8R8G8B8 format, very good quality. I think on windows7 it 's the same image you get when doing "keyboard print screen", because i think Microsft uses GetFrontBufferData also. – mofo77 Mar 14 '19 at 20:26
@mofo77 - One more guess - I think the deblocking and/or sharpening could be counter acting the antialiasing - see if you can turn it off for encoding and decoding. "In contrast with older MPEG-1/2/4 standards, the H.264 deblocking filter is not an optional additional feature in the decoder. It is a feature on both the decoding path and on the encoding path, so that the in-loop effects of the filter are taken into account in reference macroblocks used for prediction." https://en.wikipedia.org/wiki/Deblocking_filter – Markus Schumann Mar 15 '19 at 17:14
@MarkusSchumann, you said what must be said : H.264 was more designed for natural content rather than screen capture. The problem comes with text inside capture. – mofo77 Mar 16 '19 at 00:21

mofo77 · Answer 2 · 2019-03-18T23:47:31.350

I'm trying to understand your problem.

My program ScreenCaptureEncode uses default Microsoft encoder settings :

Profile : baseline
Level : 40
CODECAPI_AVEncCommonQuality : 70
Bitrate : 2000000

From my results, i think quality is good/acceptable.

You can change profile/level/bitrate with MF_MT_MPEG2_PROFILE/MF_MT_MPEG2_LEVEL/MF_MT_AVG_BITRATE. For CODECAPI_AVEncCommonQuality, it seems like you are trying to use a locally registered encoder, because you're on Win7, to set that value to 100, I guess.

but I do not think that will change things significantly.

So.

here is 3 screenshots with keyboard print screen :

the screen
the encoded screen, playing by a video player in fullscreen mode
the encoded screen, playing by a video player in a non fullscreen mode

The two last pictures are from the same video encoded file. The video player introduces aliasing when not playing in fullscreen mode. The same encoded file playing in fullscreen mode is not so bad, comparing to the original screen, and with default encoder settings. You should try this. I think we have to look at this more closely.

I think aliasing comes from your video player, and because not playing in fullscreen mode.

PS : I use the video player MPC-HC.

PS2: my program needs to be improved :

(not sure) use IDirect3D9Ex to improve buffered mechanism. On Windows7, for rendering, IDirect3D9Ex is better (no swap buffer). Perhaps it's the same for capture screen (todo list).
I should use two threads, one for capture screen, and one for encoding.

EDIT

Did you read this :

CODECAPI_AVLowLatencyMode

Low-latency mode is useful for real-time communications or live capture, when latency should be minimized. However, low-latency mode might also reduce the decoding or encoding quality.

About why my program using MFVideoFormat_RGB32 and yours using MFVideoFormat_YUY2. By default, SinkWriter has converters enable. The SinkWriter converts MFVideoFormat_RGB32 to a compatible h264 encoder format. For Microsoft encoder, read this : H.264 Video Encoder

Input format :

MFVideoFormat_I420
MFVideoFormat_IYUV
MFVideoFormat_NV12
MFVideoFormat_YUY2
MFVideoFormat_YV12

So there is no MFVideoFormat_RGB32. The SinkWriter does the conversion using the Color Converter DSP, I think.

so definitely, the problem does not come from converting rgb to yuv, before encoding.

PS (last)

like Markus Schumann said ;

H.264 was more designed for natural content rather than screen capture.

He should have mentioned that the problem is particularly related to text capture.

You just have found encoder limitation. I just think that no encoder is optmized for text encoding, with an acceptable streching, like I mention with video player rendering.

You see aliasing on final video capture, because it is fixed information inside the movie. Playing this movie in fullscreen (same as capture) is OK.

On Windows, text is calculate according to the screen resolution. So display is always good.

this is my last conclusion.

how did you find out profile and level? I'm trying to extract that information but can't seem find a way to get those — M. Ather Khan, Aug 11 '20 at 06:56

iamrameshkumar · Accepted Answer · 2019-11-26T12:17:35.557

After so much research and effort, the problems are fixed. Color quality problem was due to the software-based color conversion which leads to aliasing (RGB to YUV in the encoder and back at the decoder). Using a hardware-accelerated color convertor solved the aliasing and image quality problems.

Setting optimal values to CODECAPI_AVEncMPVGOPSize, CODECAPI_AVEncMPVDefaultBPictureCount and CODECAPI_AVEncCommonLowLatency properties solved the buffering problems.

Windows Media Foundation MFT buffering and video quality issues (Loss of colors, not so smooth curves, especially text)

3 Answers3