4

I am using MSS to capture the screenshot of my screen. (Because it captures faster screenshots)

But I am not sure how to go about capturing a specific window in Mac, I know they have win32 for Windows users... They code I have now is just a constant loop capturing my main monitor.

main.py :

import cv2 as cv
import numpy as np
from time import time
from mss import mss


def window_capture():
    loop_time = time()

    with mss() as sct:
        monitor = {"top": 40, "left": 0, "width": 800, "height": 600}

        while(True):

            screenshot = np.array(sct.grab(monitor))
            screenshot = cv.cvtColor(screenshot, cv.COLOR_RGB2BGR)

            cv.imshow('Computer Vision', screenshot)

            print('FPS {}'.format(1 / (time() - loop_time)))
            loop_time = time()

            if cv.waitKey(1) == ord('q'):
                cv.destroyAllWindows()
                break


window_capture()

print('Done.')
Blue
  • 243
  • 1
  • 8
  • 26

3 Answers3

3

I wrote the following piece of ObjectiveC that gets the names, owners, window id and position on screen of all the windows in macOS. I saved it as windowlist.m and compiled it with the commands in the comments at the top of the file:

////////////////////////////////////////////////////////////////////////////////
// windowlist.m
// Mark Setchell
//
// Get list of windows with their characteristics
//
// Compile with:
// clang windowlist.m -o windowlist -framework coregraphics -framework cocoa
//
// Run with:
// ./windowlist
//
////////////////////////////////////////////////////////////////////////////////
#include <Cocoa/Cocoa.h>
#include <CoreGraphics/CGWindow.h>

int main(int argc, char **argv)
{
   NSArray *windows = (NSArray *)CGWindowListCopyWindowInfo(kCGWindowListOptionOnScreenOnly,kCGNullWindowID);
   for(NSDictionary *window in windows){
      int WindowNum = [[window objectForKey:(NSString *)kCGWindowNumber] intValue];
      NSString* OwnerName = [window objectForKey:(NSString *)kCGWindowOwnerName];
      int OwnerPID = [[window objectForKey:(NSString *) kCGWindowOwnerPID] intValue];
      NSString* WindowName= [window objectForKey:(NSString *)kCGWindowName];
      CFDictionaryRef bounds = (CFDictionaryRef)[window objectForKey:(NSString *)kCGWindowBounds];
      CGRect rect;
      CGRectMakeWithDictionaryRepresentation(bounds,&rect);
      printf("%s:%s:%d:%d:%f,%f,%f,%f\n",[OwnerName UTF8String],[WindowName UTF8String],WindowNum,OwnerPID,rect.origin.x,rect.origin.y,rect.size.height,rect.size.width);
   }
}

It gives output like this, where the last 4 items on each line are the window top-left corner, height and width. You can either run this program "as is" with Python's subprocess.Popen() and get the window list, or you could maybe convert it to Python using PyObjc Python module:

Location Menu:Item-0:4881:1886:1043.000000,0.000000,22.000000,28.000000
Backup and sync from Google:Item-0:1214:8771:1071.000000,0.000000,22.000000,30.000000
Dropbox:Item-0:451:1924:1101.000000,0.000000,22.000000,28.000000
NordVPN IKE:Item-0:447:1966:1129.000000,0.000000,22.000000,26.000000
PromiseUtilityDaemon:Item-0:395:1918:1155.000000,0.000000,22.000000,24.000000
SystemUIServer:AppleTimeMachineExtra:415:1836:1179.000000,0.000000,22.000000,40.000000
SystemUIServer:AppleBluetoothExtra:423:1836:1219.000000,0.000000,22.000000,30.000000
SystemUIServer:AirPortExtra:409:1836:1249.000000,0.000000,22.000000,30.000000
SystemUIServer:AppleVolumeExtra:427:1836:1279.000000,0.000000,22.000000,30.000000
SystemUIServer:BatteryExtra:405:1836:1309.000000,0.000000,22.000000,67.000000
SystemUIServer:AppleClockExtra:401:1836:1376.000000,0.000000,22.000000,123.000000
SystemUIServer:AppleUser:419:1836:1499.000000,0.000000,22.000000,99.000000
Spotlight:Item-0:432:1922:1598.000000,0.000000,22.000000,36.000000
SystemUIServer:NotificationCenter:391:1836:1634.000000,0.000000,22.000000,46.000000
Window Server:Menubar:353:253:0.000000,0.000000,22.000000,1680.000000
Dock:Dock:387:1835:0.000000,0.000000,1050.000000,1680.000000
Terminal:windowlist — -bash — 140×30:4105:6214:70.000000,285.000000,658.000000,1565.000000
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
2
import cv2 as cv
import numpy as np
from time import time
from mss import mss
from Quartz import CGWindowListCopyWindowInfo, kCGNullWindowID, kCGWindowListOptionAll
import Quartz

windowName = "Window Name like the name written on top of the window"


def get_window_dimensions(hwnd):
    window_info_list = Quartz.CGWindowListCopyWindowInfo(Quartz.kCGWindowListOptionIncludingWindow, hwnd)

    for window_info in window_info_list:
        window_id = window_info[Quartz.kCGWindowNumber]
        if window_id == hwnd:
            bounds = window_info[Quartz.kCGWindowBounds]
            width = bounds['Width']
            height = bounds['Height']
            left = bounds['X']
            top = bounds['Y']
            return {"top": top, "left": left, "width": width, "height": height}

    return None


def window_capture():
    loop_time = time()
    windowList = CGWindowListCopyWindowInfo(
        kCGWindowListOptionAll, kCGNullWindowID)

    for window in windowList:
        print(window.get('kCGWindowName', ''))
        if windowName.lower() in window.get('kCGWindowName', '').lower():
            hwnd = window['kCGWindowNumber']
            print('found window id %s' % hwnd)

    monitor = get_window_dimensions(hwnd)

    with mss() as sct:
        # monitor = {"top": 40, "left": 0, "width": 800, "height": 600}

        while (True):

            screenshot = np.array(sct.grab(monitor))
            screenshot = cv.cvtColor(screenshot, cv.COLOR_RGB2BGR)

            cv.imshow('Computer Vision', screenshot)

            print('FPS {}'.format(1 / (time() - loop_time)))
            loop_time = time()

            if cv.waitKey(1) == ord('q'):
                cv.destroyAllWindows()
                break


window_capture()

print('Done.')

A solution for the problem for macos, its easy to find the answer for windows but not so for mac. So here's the code folks. Solid 40fps... If you do the same by cv2 or pyautogui, it gives 5fps.

Prem Sinha
  • 11
  • 4
0

If you want to open Chrome browser, then you can use Python built-in package webbrowser. You will need to supply a path to Chrome app e.g: webbrowser.get('open -a /Applications/Google\ Chrome.app %s').open('http://docs.python.org/')

Once browser is open, the app position will be where it left off. MSS doesnot allow you to select the app. Instead you can grab the entire screen or a set position (as you've specified monitor = {"top": 40, "left": 0, "width": 800, "height": 600}). Therefore you might want to force the browser to go full screen. This can be achieve using pyautogui package to enter in the hotkeys.

import webbrowser
import pyautogui

def openApp(url, appPath):
    webbrowser.get(appPath).open(url)

def fullScreen():    
    pyautogui.hotkey('command', 'ctrl', 'f') # hotKeys for full screen mode in MacOS

url = 'http://docs.python.org/'    
appPath = 'open -a /Applications/Google\ Chrome.app %s' #MacOS
#appPath = 'C:/Program Files (x86)/Google/Chrome/Application/chrome.exe %s' # Windows
#appPath = ' /usr/bin/google-chrome %s' #Linux
openApp(url, appPath)
fullScreen()
# here you can add logic to take screenshots

(Note: I've only tested this on Windows, but should work on MacOS)

Greg
  • 4,468
  • 3
  • 16
  • 26
  • I like this, but I am trying to open a app, if you are curious as of what app, it would be runescape. – Blue Jul 09 '20 at 17:49
  • I will mark this as the answer tho because it gave me a better general idea of where to go, I think I can use something like import OS, and then os.startfile ? – Blue Jul 09 '20 at 17:50
  • 1
    OS.startfile maybe windows only. This should work os.system('C:\dev\HelloWorld.exe'). See https://stackoverflow.com/questions/13222808/how-to-run-external-executable-using-python/13222809 – Greg Jul 09 '20 at 18:37
  • This doesn't answer the question, so why is it marked as correct? – Iron Attorney Jun 17 '22 at 12:50