Multithreading the render and event loop in a Windows app: what should be done to avoid glitches?

Question

I started to look into multi-threading the render loop because I wanted to get around the issue of the client area needing to be updated with a user holding the window bar or holding the mouse on a border to resize the window without moving the mouse. The render loop redraws the content of the window using either Vulkan or DirextX. So it uses a swap chain and the present mechanism to get the latest frame displayed to the screen.

I more or less got this working (more on this later). But I also want to ensure the rendering is as clean as possible when the window is resized. As you all know if you have dealt with this before, Windows generally draws garbage where new pixels are exposed when you resize (enlarge the window).

That seems to be a recurring question on Stackoverflow, and they are rather well-written and interesting answers such as this one that goes into a lot of depth already:

How to smooth ugly jitter/flicker/jumping when resizing windows, especially dragging left/top border (Win 7-10; bg, bitblt and DWM)?

However:

As mentioned by the answer's authors, he/she made mostly guesses about what Windows is doing. So I would be interested to see an answer to this problem that's not just based on guess (if someone has that knowledge).
The answers provide insights but no solution.
Furthermore, the question is not being specifically asked within the context of a multi-threaded app.

Now the answers suggest that Windows does probably request a user to redraw the client area within 1/60th of a second (assuming a 60Hz refresh rate for the screen). My understanding from the various posts is also that in Windows (which I don't know well I have to say), it's better to catch the WM_WINDOWPOSCHANGING event rather than WM_SIZE OR WM_SIZING as Windows would start to wait for you to redraw the client area before doing it itself when you return from WM_WINDOWPOSCHANGING.

So my solution was to do this in the main thread (the main thread that deals with Windows messages):

case WM_WINDOWPOSCHANGING:
{
    std::unique_lock lock(m);
    cv.wait(lock, []() { return is_drawing == false; });
    is_resizing = true;
    
    RECT client_rect;
    GetClientRect(hwnd2, &client_rect);
    window_width = client_rect.right - client_rect.left;
    window_height = client_rect.bottom - client_rect.top;

    draw();

    is_resizing = false;
    lock.unlock();
    cv.notify_one();
    return 0;
}

and that in the render thread:

void render_func()
{
    while (keep_running)
    {

        std::unique_lock lock(m);
        cv.wait(lock, []() { return is_resizing == false; });
        is_drawing = true;

        draw();

        is_drawing = false;
        lock.unlock();
        cv.notify_one();
    }
}

void draw()
{
   if (size_changed)
       recreate_swapchain(window_width , window_height);
   acquire_texture_view_from_swapchain();
   do_GPU_magic();
   present(); // swap buffers
}

I am using a condition_variable to get WindowProc to wait if the render thread is drawing. If the render thread is not drawing then we compute the client area size then force a draw to be sure that drawing happens before we return from WindowProc and then we set a is_resizing flag to false to signal the render thread that it can resume with rendering normally.

The reason why I came with all these loops is because it was my understanding while reading the referenced post, that windows was expecting you to redraw the client area in roughly 16 ms and that if you were not doing it in that timeframe, then it would do it for you (with whatever means it could come up with: background color, garbage, etc.). So forcing a draw() call before we would return from WindowProc should allow that. I also understand that with DWM, the redraw by Windows is asynchronous. And so it seems that you don't know for sure, "when will Windows actually paint into the client area".

To be sure the draw call was super quick, the only think I do is clear the buffer with a plain color. The background color at the window's creation is red. When I call draw() from WindowProc I set the bg color of the buffer rendered via Vulkan / DirectX to blue. When the windows' content is rendered while the render thread is running, the bg color is set to green.

Also the present mode is set to immediate. Meaning the buffer should be presented to the surface as soon as possible. So I should really get under the 16 ms requirement (I timed 465 microseconds for the entire process).

Interestingly I do get "expected" results. It's green when I do nothing, and blue when I resize the window. Also the redraw is perfectly smooth with not garbage redrawn and of course the content is redrawn even when I move the window. Super.

Excepted that, I occasionally get a fully red window. Somehow this means that sometimes, that code misses to draw something at the right time and windows decides to draw the entire client area with the windows initial bg color.

I have no idea what else to try, and it feels already quite hacky. It seems like I am in muddy territory here; not sure it's even possible reliably (because of that asynchronous DWM process). I don't think many people have needed to tackle that problem before, even though in 2023, I'd think this would be a common requirement.

Do you have feedback on the approach I have chosen? Have you managed to get to work somehow? If you could share your solution, it would be greatly appreciated.

EDIT 1

Following @SimonMourier's request here some code. I can't share the D3D12 code for business reasons but I can share an example I put together using Dawn which we have been testing internally. Not sure this is any useful if you haven't the Dawn libs at hand, but they are not difficult to build. Here is the code:

#include <Windows.h>

#include <iostream>
#include <thread>
#include <chrono>

#include "dawn/webgpu_cpp.h"
#include "dawn/dawn_proc.h"
#include "dawn/native/DawnNative.h"

#include <cassert>

#include <semaphore>
#include <mutex>

dawn_native::Instance instance;
wgpu::Device device;
wgpu::Queue queue;
wgpu::Surface surface;
wgpu::SwapChain swapChain;

std::atomic<uint32_t> window_width { 640 };
std::atomic<uint32_t> window_height { 480 }; 

using namespace std::chrono_literals;

std::atomic<bool> keep_running = true;

std::mutex m;

#ifndef UNICODE
#define UNICODE
#endif

HWND hwnd2;

void draw(float r, float g, float b);

bool is_resizing = false;
bool is_drawing = true;
std::condition_variable cv;

wgpu::PresentMode present_mode = wgpu::PresentMode::Fifo;

LRESULT CALLBACK WindowProc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
    switch (uMsg)
    {
        case WM_DESTROY:
            PostQuitMessage(0);
            keep_running = false;
            return 0;

        case WM_WINDOWPOSCHANGING:
        case WM_WINDOWPOSCHANGED:
        {
            if (!surface)
                return 0;
            
            RECT client_rect;
            GetClientRect(hwnd2, &client_rect);
            window_width = client_rect.right - client_rect.left;
            window_height = client_rect.bottom - client_rect.top;
#ifdef MULTITHREADED
            std::unique_lock lock(m);
            cv.wait(lock, []() { return is_drawing == false; });
            is_resizing = true;
#endif
            
            draw(0,0,1);

#ifdef MULTITHREADED
            is_resizing = false;
            lock.unlock();
            cv.notify_one();
#endif
            return 0;
        }
        
    }
    return DefWindowProc(hwnd, uMsg, wParam, lParam);
}

void create_window()
{
    const wchar_t CLASS_NAME[] = L"MyWindowClass";

    WNDCLASS wc = {};

    wc.lpfnWndProc = WindowProc;
    wc.hInstance = GetModuleHandle(nullptr);
    wc.lpszClassName = CLASS_NAME;
    wc.hbrBackground = CreateSolidBrush(RGB(255, 0, 0));
    wc.style = CS_HREDRAW | CS_VREDRAW;

    RegisterClass(&wc);
    
    DWORD dwStyle = WS_OVERLAPPEDWINDOW;
    RECT rc = { 0, 0, (int32_t)window_width, (int32_t)window_height };
    AdjustWindowRectEx(&rc, dwStyle, FALSE, 0);

    hwnd2 = CreateWindowEx(
        0,
        CLASS_NAME,
        L"",
        dwStyle,
        CW_USEDEFAULT, CW_USEDEFAULT,
        rc.right - rc.left, rc.bottom - rc.top,
        NULL,
        NULL,
        GetModuleHandle(nullptr),
        NULL);
        
    assert(hwnd2 != nullptr);
        
    ShowWindow(hwnd2, SW_SHOW);
}


// END OF WINDOWS STUFF

std::unique_ptr<wgpu::ChainedStruct> SetupWindowAndGetSurfaceDescriptor() {
    std::unique_ptr<wgpu::SurfaceDescriptorFromWindowsHWND> desc =
        std::make_unique<wgpu::SurfaceDescriptorFromWindowsHWND>();
    desc->hwnd = hwnd2;
    desc->hinstance = GetModuleHandle(nullptr);
    return std::move(desc);
}


wgpu::Surface CreateSurfaceForWindow(const wgpu::Instance& instance) {
    std::unique_ptr<wgpu::ChainedStruct> chainedDescriptor =
        SetupWindowAndGetSurfaceDescriptor();

    wgpu::SurfaceDescriptor descriptor;
    descriptor.nextInChain = chainedDescriptor.get();
    wgpu::Surface surface = instance.CreateSurface(&descriptor);

    return surface;
}

void init()
{
    instance.DiscoverDefaultAdapters();

    std::vector<dawn::native::Adapter> adapters = instance.GetAdapters();
    auto adapterIt = std::find_if(adapters.begin(), adapters.end(),
        [](const dawn::native::Adapter adapter) -> bool {
            wgpu::AdapterProperties properties;
            adapter.GetProperties(&properties);
            return properties.backendType == wgpu::BackendType::Vulkan;
        });
    if (adapterIt == adapters.end()) {
        return;
    }

    dawn::native::Adapter chosenAdapter = *adapterIt;

    DawnProcTable procs(dawn_native::GetProcs());
    dawnProcSetProcs(&procs);

    device = wgpu::Device::Acquire(chosenAdapter.CreateDevice());

    queue = device.GetQueue();
    
    surface = CreateSurfaceForWindow(instance.Get());

    wgpu::SwapChainDescriptor swapChainDesc = {
        .usage = wgpu::TextureUsage::RenderAttachment,
        .format = wgpu::TextureFormat::BGRA8Unorm,
        .width = window_width,
        .height = window_height,
        .presentMode = present_mode,
    };

    swapChain = device.CreateSwapChain(surface, &swapChainDesc);
}

int w = 0, h = 0;

void draw(float r, float g, float b)
{
    if (!surface)
        return;
    if (w != window_width || h != window_height) {
        w = window_width, h = window_height;
        wgpu::SwapChainDescriptor swapChainDesc = {
            .usage = wgpu::TextureUsage::RenderAttachment,
            .format = wgpu::TextureFormat::BGRA8Unorm,
            .width = window_width,
            .height = window_height,
            .presentMode = present_mode,
        };
        
        swapChain = device.CreateSwapChain(surface, &swapChainDesc);
    }
    wgpu::TextureView backBuffer = swapChain.GetCurrentTextureView();   
    wgpu::CommandEncoder encoder = device.CreateCommandEncoder();
    
    wgpu::RenderPassColorAttachment renderPassColorAttachment = {
        .view = backBuffer,
        .resolveTarget = nullptr,
        .loadOp = wgpu::LoadOp::Clear,
        .storeOp = wgpu::StoreOp::Store,
        .clearValue = {r, g, b ,1},
    };
    
    wgpu::RenderPassDescriptor renderPassDescriptor = {
        .colorAttachmentCount = 1,
        .colorAttachments = &renderPassColorAttachment,
        .depthStencilAttachment = nullptr,
    };
    
    wgpu::RenderPassEncoder pass = encoder.BeginRenderPass(&renderPassDescriptor);

    pass.End();

    wgpu::CommandBuffer command = encoder.Finish();

    queue.Submit(1, &command);

    swapChain.Present();
}

void render_func()
{
    while (keep_running)
    {
        std::unique_lock lock(m);
        cv.wait(lock, []() { return is_resizing == false; });
        is_drawing = true;
        
        draw(0,rand() / (float)RAND_MAX,0);
        
        is_drawing = false;
        lock.unlock();
        cv.notify_one();
        
        // give a chance to the event loop to process messages
        std::this_thread::sleep_for(4ms);
    }
}

void handleWindowsEvent(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{}

int main()
{
    create_window();
    
    init();

#ifdef MULTITHREADED
    std::thread render_thread(render_func);
#endif
    MSG msg = {};
    while (keep_running)
    {
        if (PeekMessage(&msg, hwnd2, 0U, 0U, PM_REMOVE)) {
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }
#ifndef MULTITHREADED
        draw(0,rand() / (float)RAND_MAX,0);
#endif

    }
    
    return 0;
}

Compiled with:

clang++ -O3 -std=c++20 -I$DAWN_PATH/dawn/include -I$DAWN_PATH/dawn/out/Release/gen/include -I$DAWN_PATH/dawn/src/dawn/ -L$DAWN_PATH/dawn/out/Release -ldawn_native.dll -ldawn_proc.dll -ldawn_platform.dll webgpu_cpp.o Source.cpp -DMULTITHREADED -lUser32 -lGdi32 -DUNICODE

Note: I had to put a small sleep in the render loop otherwise the event loop never gets a chance to grab the lock. Now if you run this, and resize the window, you will see that you get green when you don't resize, blue when you resize, and red sometimes.

I looked into the composition mechanism but not sure where it would fit into a program where you use a real-time backend to draw to the screen?

Also to be clear, the problems I need to solve:

be sure the content is refreshed (render loop still goes) when a user moves the window or holds resize position. In the single-threaded model, Windows blocks the refresh due to Modal architecture.
Ensure Windows does a clean refresh and does not draw garbage in the newly exposed pixels when you enlarge the window.

Interestingly, I don't see this garbage when you use the single-threaded option in this program, even when I resize the window rather quickly.

Anyway, if this can be done using a single-threaded approach and whatever new modern architecture available on Windows, I'd be super keen on using it.

(Speaking for recent Windows version) You shouldn't need threads nor tweaks nor hacks but instead cooperate with DWM. You're supposed to *compose* your scene and let the DWM render it. The render loop that everyone uses (ie: http://www.directxtutorial.com/Lesson.aspx?lessonid=9-1-4) is not the best way to integrate with today's Windows. Direct Compositon (newer version is called "Visual Layer" https://learn.microsoft.com/en-us/windows/apps/desktop/modernize/using-the-visual-layer-with-win32) is. A seminal example is this one https://gist.github.com/kennykerr/62923cdacaba28fedc4f3dab6e0c12ec — Simon Mourier, Apr 02 '23 at 06:34
But if you have a full reproducible DirectX sample, it would be better to work and exchange on code instead of talking english. — Simon Mourier, Apr 02 '23 at 06:36
@SimonMourier. I will try to put a sample together but as I am using a rather complex back end for the rendering, it would take me some time to write a sample from scratch. Thanks for the link. I will try to improve the question with a working example. — user18490, Apr 02 '23 at 08:33
This is not a simple minimal reproducible code https://stackoverflow.com/help/minimal-reproducible-example it needs dawn, etc. so it's difficult to help. FWIW here is an example that uses Visual Composition (it's c# but easy to understand) https://github.com/smourier/DirectN/tree/master/DirectN/DirectN.WinUI3.MinimalD3D11 you can compare with original code thas the standard message pump. https://gist.github.com/d7samurai/abab8a580d0298cb2f34a44eec41d39d — Simon Mourier, Apr 02 '23 at 14:21
@Simon Mourier: a one compile file program is minimal) It just needs Dawn yes). Thanks for sharing your repos. I looked at this orginal C++ implementation and I don't see it using the Composition mechanism. All it does it presenting the buffer from the swapchain. So nothing different from what I do? — user18490, Apr 02 '23 at 17:40
Precisely, the original C++ doesn't use Visual Composition, the C# does, it doesn't use a "render loop", the message pump is hidden and not tweaked. — Simon Mourier, Apr 02 '23 at 18:08
Thanks for clearing this out). I will look again. Now the question remains? Will this solve the "content of the window not being drawn when the user moves the window?" Is the Direct Composition by-passing the event loop entirely? — user18490, Apr 02 '23 at 20:43

Multithreading the render and event loop in a Windows app: what should be done to avoid glitches?

EDIT 1

0 Answers0