Improving Bitmap retrieval (from screen/window) speed and storage for multi-threaded access using C#

Question

I am having a couple problems trying to read in screenshots on the fly as well as multithreaded access to those screenshots (Bitmap).

For starters, capturing the screen data I have the following code:

NativeMethods.GetWindowRect(HandleDisplay, out r);
Size s = new Size(r.Right - r.Left, r.Bottom - r.Top);
Bitmap b = new Bitmap(s.Width, s.Height, PixelFormat.Format24bppRgb);
using (Graphics gfxBmp = Graphics.FromImage(b))
{
    try {
        gfxBmp.CopyFromScreen(r.Left, r.Top, 0, 0, s, CopyPixelOperation.SourceCopy);
    }
    finally
    {
        gfxBmp.Dispose();
    }
}

Now the code is working as expected and the Bitmap created is correct. The first problem I am running into with this code is the speed. I haven't timed out the speed in milliseconds for execution, however I can say that the code is wrapped and executed to update a Bitmap repeatedly in its own thread, and on a screen that is 1000 x 600 pixels it updates at around 45 to 50 times per second. If the screen size is larger the execution rate drops significantly, for example capturing the window that is 1440 x 900 will drop the updates to 20 times per second or less.

The second problem I am having is the bitmap is being stored/updated repeatedly from one thread, and I have other threads that need to access that Bitmap (or portion of). The problem is that when the main update thread writes to the Bitmap, and another thread attempts to read the bitmap, I obviously get the "Object already in use elsewhere" error. By multithreaded I mean Tasks like this:

Task t = Task.Run(() => {
    while(true) {
       try {
           token.ThrowIfCancellationRequested();
           // update/process the data
           ...
       } catch {
           break;
       }
}, token);

So what is the best way to store the bitmap (ie as a bitmap or byte[] or some other way) and access it from other threads. I have been playing with the idea of converting it to byte[] data (using LockBits and Marshal.Copy) but haven't determined yet if that is the correct route.

Questions:

Is there a way to improve the speed of getting the image data (maybe using a byte[] and somehow only 'updating' the changed pixel data).
How can I make it thread safe.
- I will have one main thread updating it constantly (write only)
- I will have many threads that will need to read all or part (read only)

Caveat: The window may or may not be DirectX / OpenGL (which was why I chose this method over others originally)

EDIT/UPDATES:

I have made the following changes which has improved performance slightly. - gfxBmp is now a static property initialized with the instance constructor - removed the event handler (class no longer needs to throw events notifying that the image was updated) - access to image property is wrapped with ReaderWriterLockSlim (and reading from other threads seems to be working great)

As noted, all possible overhead has been removed and the Task now looks like this:

cts = new CancellationTokenSource();
CancellationToken token = cts.Token;
IsRunning = true;
Task t = Task.Run(() =>
{
    while (IsRunning)
    {
        try
        {
            token.ThrowIfCancellationRequested();
            Refresh();
            CountFrame();
        }
        catch
        {
            IsRunning = false;
            break;
        }
    }
}, token);

My refresh method looks like:

private void Refresh()
{
    NativeMethods.Rect r = new NativeMethods.Rect();
    NativeMethods.GetWindowRect(hWndDisplay, out r);
    if (r != rect) Rect = r;
    gfx.CopyFromScreen(rect.Left, rect.Top, 0, 0, size, CopyPixelOperation.SourceCopy);
}

The CountFrame is irrelevant in the speed as I benchmarked with and without as follows:

With CountFrame (Screen size captured 872px x 480px ~60fps):
Iterations: 1000, Total: 16.20 s, Average: 16.195 ms
Iterations: 1000, Total: 16.18 s, Average: 16.178 ms
Iterations: 1000, Total: 16.44 s, Average: 16.44 ms

Without CountFrame (Screen size captured 872px x 480px ~60fps):
Iterations: 1000, Total: 16.21 s, Average: 16.213 ms
Iterations: 1000, Total: 16.17 s, Average: 16.17 ms
Iterations: 1000, Total: 16.18 s, Average: 16.183 ms

Without CountFrame (Screen size captured 1402px x 828px ~32fps)
Iterations: 1000, Total: 29.74 s, Average: 29.744 ms
Iterations: 1000, Total: 30.08 s, Average: 30.083 ms
Iterations: 1000, Total: 29.44 s, Average: 29.444 ms

So the thread safety problem appears to be fixed, optimizations have been done as noted. So I am now still struggling with the speed. As you can see with the FrameRate drop at 1402 x 828, the question of a better way of capture is still not clear.

Currently I am using Graphics.CopyFromScreen(...) to get the pixels, which I understand uses bitblt internally. Is there maybe another method of getting the data from the region of the screen possibly with something like pixel/byte data comparisons and only updating changed information, or other resources I should be looking at?

For a start, you're spinning up new `Bitmap` and `Graphics` objects which will have a hefty overhead. The BitBlt will be quite quick but creating the objects and underlying DCs is time consuming. Much like in double buffering, you want an in-memory bitmap that you don't clean up until program end and all you need to worry about then is a memory copy. Also do away with the `try`/`finally` as it's unnecessary overhead and would be handled by the `Dispose()` anyway — DiskJunky, Sep 26 '17 at 16:00
In regard to the multi-threading locking/reading; https://stackoverflow.com/questions/1330507/best-c-sharp-solution-for-multithreaded-threadsafe-read-write-locking — DiskJunky, Sep 26 '17 at 16:02
@DiskJunky with regards to spinning up the new Bitmap and Graphics objects, I am in fact only spinning up the Graphics object each time, I modified the code for sake of understandability, but the Bitmap object is actually a private variable in the class that is only created on initial instansiation (constructor) or recreated if the window size changes. I will implement your other notes as well. On another note, I have made the class a singleton so it can be accessed anywhere without being passed through numerous arguments. — Aaron Murray, Sep 26 '17 at 16:14
Your `NativeMethods.Rect` could be created once as well. I'm very surprised at the lack of performance around the `CopyFromScreen()` method. Under the hood, the GDI32 library is very fast with this (remember, it's copying bits from one second of RAM to another) so the performance bottleneck is not obvious. I'm assuming you're not starting/stopping threads all the time but just spinning up what you need at the start? E.g., no `Task.Start()` on each loop iteration? — DiskJunky, Sep 27 '17 at 11:10
I fixed the NativeMethods.Rect issue. Further investigation however, I am in fact still encountering the multithreading problems. As a note: the Graphics.CopyFromScreen() is in fact bypassing the lock (setter) altogether. I can confirm that the gfx.FromImage(Screen) // Screen is the getter/setter for the private static screen bitmap. Temporary workaround to resolve I am doing CopyFromScreen to a temp bitmap and then copying it then to the 'Screen' (setter method) which fixes the lock, but creates a ton of overhead. Trying to work this issue out still now. Also, no tasks inside loop. — Aaron Murray, Sep 27 '17 at 19:14
Hmm, then based on that info, the bottleneck sounds like it's coming in from the thread synchronization in acquiring and getting locks. Ideally, only the setter has the lock - a read doesn't much care if it's the old or new, just that it gets one. To determine this you could use a `HighResolutionTimer` object to time the calls and pin down the exact code that's taking longest but I suspect it's around the thread locks based on info to date. https://msdn.microsoft.com/en-us/library/aa964692(v=vs.85).aspx — DiskJunky, Sep 27 '17 at 20:48
It occurs to me to ask: why are you drawing the screenshot from multiple tasks? This capture/draw cycle is effectively a synchronous task with no need (or benefit) from doing so via multiple tasks. Unless I'm missing something in the intent? What performance do you get from doing it all on the same (UI) thread and bypassing any/all locks/thread sync? — DiskJunky, Sep 27 '17 at 21:19
I am unclear on what you are asking. I am not drawing it on other threads, just the one thread is updating the bitmap (refreshing it repeatedly) within a singleton instance. Other threads read the bitmap from the singleton and image match (Emgu/OpenCV) or Ocr (Tesseract), portions of that bitmap. The UI thread does currently update a picturebox so I can see the current bitmap (for debugging/testing). — Aaron Murray, Sep 27 '17 at 21:33
Ah, ok, that wasn't clear from the post or comments. This still boils down to "is a dirty read acceptable"? If so, then reducing the locks/sync required should speed up the process overall. Only the setter needs a clean write but if it's just setting an object then no locking is needed at all - let the readers get what they need at that moment in time and forget about thread sync. That said, anything reading the bitmap values would need to persist the object instance and not repeatidly get it from a property. E.g.; at the start of each reader have `var tmp = Singleton.Image;` and use that — DiskJunky, Sep 27 '17 at 21:36
And even then, why can't each consumer call `CopyFromScreen()` directly? Let the OS/graphics driver sort out the sync — DiskJunky, Sep 27 '17 at 21:41
I think maybe this is a case of me getting inside my own head a bit. It started out as a thought about finding an easy way to pass around an image between different components and various threads. In reality your point of having each reader gathering the portion of the screen it needs via its own CopyFromScreen() call as it needs. I have some rethinking. — Aaron Murray, Sep 27 '17 at 22:21

score 0 · Answer 1 · answered Sep 29 '17 at 14:43

Great thanks goes out to @DiskJunky as his comments are the real answer:

1. Thread safety - moved the Graphics.CopyFromScreen() logic to each individual ready and only reading the necessary portion of the screen for the readers use. This allows for a single bitmap to be 'owned' by the reader, removing the need for thread locking.

2. Performance - with the above, the reader now becomes a bit heavier with the inclusion of the CopyFromScreen, however removes the need (code) for cropping the portion of the shared screen required and locking logic. Also where possible code was restructured to instantiate and reuse properties within the reader class whenever possible (ie. no longer needs to spin up a new Graphic object every time a CopyFromScreen is called) producing less overhead and clock cycles.

This came down to ultimately rethinking the way I was handling the logic:

Original logic was to have one thread that updated the bitmap constantly and shared it to all the readers that needed it when they needed it.

Optimized logic let the readers handle getting only the information necessary, and when necessary straight from the source instead of a middleman. Producing cleaner code with less overhead.

Improving Bitmap retrieval (from screen/window) speed and storage for multi-threaded access using C#

1 Answers1