0

Summary: How can I get the image data (a screenshot) of an application by its window name in Linux for further processing using Python 3.X

I need to do some image detection on GUIs running in both Linux and Windows. I have working code for Windows 10 (below) that gets an applications image data using the win32 API and puts it into a numpy array for using with OpenCV. I now need a similar approach for Linux.

Importantly, it would be greatly beneficial if the window did not have to be visible for the image data to be obtained (the windows will never be minimised, but they might be obscured by other applications, and I don't want to have to force the windows to be the active windows for the sake of what is essentially a screenshot)

I have come across this thread using the Gdk API, which works, but only for the application that invokes it. I have also read this thread, using PIL might be the way to go, but i'd appreciate some input on this matter

Example of what I need, working on Windows 10

import numpy as np
import win32gui, win32ui, win32con
import cv2 as cv

def get_img_data(window_name):
        window = win32gui.FindWindow(None, window_name)
        if not window:
            raise Exception('Window not found: {}'.format(window_name))

        # get the window size
        window_rect = win32gui.GetWindowRect(window)
        w = window_rect[2] - window_rect[0]
        h = window_rect[3] - window_rect[1]

        # account for the window border and titlebar and cut them off
        border_pixels = 8
        titlebar_pixels = 30
        w = w - (border_pixels * 2)
        h = h - titlebar_pixels - border_pixels
        cropped_x = border_pixels
        cropped_y = titlebar_pixels

        # set the cropped coordinates offset so we can translate screenshot
        # images into actual screen positions
        offset_x = window_rect[0] + cropped_x
        offset_y = window_rect[1] + cropped_y

        wDC = win32gui.GetWindowDC(window)
        dcObj = win32ui.CreateDCFromHandle(wDC)
        cDC = dcObj.CreateCompatibleDC()
        dataBitMap = win32ui.CreateBitmap()
        dataBitMap.CreateCompatibleBitmap(dcObj, w, h)
        cDC.SelectObject(dataBitMap)
        cDC.BitBlt((0, 0), (w, h), dcObj, (cropped_x, cropped_y), win32con.SRCCOPY)

        signedIntsArray = dataBitMap.GetBitmapBits(True)
        img = np.fromstring(signedIntsArray, dtype='uint8')
        img.shape = (h, w, 4)

        # free resources
        dcObj.DeleteDC()
        cDC.DeleteDC()
        win32gui.ReleaseDC(window, wDC)
        win32gui.DeleteObject(dataBitMap.GetHandle())

        img = img[...,:3]

        img = np.ascontiguousarray(img)
        # cv.imwrite('./test.png', img) # uncomment to save the screenshot

        return img

img_data = get_img_data('Task Manager')

daviegravee
  • 171
  • 2
  • 12

0 Answers0