I am trying to create an application in Python that needs to capture the contents of a window application that uses the GPU to generate an image.
I initially tried using ffmpeg with gdigrab to capture the window content, but it results in a gray screen, which after some research, I believe is because ffmpeg cannot capture content from a window that uses GPU acceleration.
I then tried using the mss library in Python and although I could capture the entire screen or a specified region without a problem, capturing content of a specific application window that overlaps with other windows results in capturing all overlapped windows, not just the target application.
Interestingly, OBS Studio can capture the window content correctly when the "Capture Method" is set to "Windows 10 (1903 and newer)", but not when it's set to "BitBlt (Windows 7 and newer)". However, OBS does not offer a native API for Python that could allow for easy integration with other applications.
Is there a way to capture the content of a window that uses GPU acceleration in Python, using ffmpeg or other library, app?
Any advice or direction would be appreciated.
EDIT: I found an option that manages to get the image directly from the application window: https://stackoverflow.com/a/76399855/14861684
I tried to implement this solution in my application using the capture_win_alt() function to capture the frame. Its output redirects to ffmpeg which creates a recording from the received frames. Unfortunately, the quality of the recording is not satisfactory - that is, for example, a recording that should have 60s and 60 fps (which gives a total of 360 frames per recording) and in fact has only about 5 seconds and looks very over-squeezed.
It seems to me that it's as if we wanted to record 10 seconds at 20 FPS but only downloaded one frame per second which gives us a total of 10 frames downloaded (instead of 200) by which the recording is converted to 20 FPS which gives 10/20 = 0.5s recording instead of 10s. By this action, the recording is significantly shortened and looks very sped up.
My current solution/implementation seems very inefficient. How to synchronize the frame rate with the amount of FPS in the recording? How to protect for the case of lost frames - insert black screen or / some way to dynamic FPS?
PS: I care about very good recording accuracy because my application simultaneously tracks various data and records it with timestamp - so as to connect the occurrence of an event with the moment of recording (matching).