Fixing inefficient image conversion from PIL image to OpenCV Mat

Question

I'm running a neural network on a live stream of screenshots 800x600 in size. Since I was only getting about 3fps, I did some troubleshooting and found out how much time is approximately spent on each step:

Screenshot: 12ms
Image processing: 280ms
Object detection and box visualisation: 16ms
Displaying image: 0.5ms

I'm using mss for taking the screenshots (documentation).

Here's the code without the object detection part:

import numpy as np
import cv2
from PIL import Image
import mss
monitor = {"top": 40, "left": 0, "width": 800, "height": 600}

with mss.mss() as sct:
    while True:

        # # Screenshot:
        image = sct.grab(monitor)

        # # Image processing:
        image = Image.frombytes("RGB", image.size, image.bgra, "raw", "RGBX")
        (im_width, im_height) = image.size
        image_np = np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)

        # # Object detection and box visualisation:
        # ...

        # # Displaying image:
        cv2.imshow("Object Detection", image_np)

Any ideas on how I can make this quicker?

Can't help you without the entire "image processing" code that takes 280ms. — karlphillip, Feb 08 '20 at 19:06
I added some comments to the code to show which lines do what. The 3 lines below ‘Image processing: ‘ take 280ms. Is that what you meant? — Cedric, Feb 08 '20 at 21:24
Measure each of those lines individually to pinpoint the culprit: my guess is that `Image.frombytes()` is the cause of the lag. — karlphillip, Feb 08 '20 at 23:40
frombytes() only takes about 2ms, .size takes 0.5ms. Third line is causing the lag. — Cedric, Feb 09 '20 at 00:19
Why do you make a PIL Image from the screen-grab bytes, then take all the bytes out of the PIL Image and make a list of the wrong shape, then make a Numpy array and then resize it? You could grab straight to a Numpy array with `img = np.array(sct.grab(monitor))`. — Mark Setchell, Feb 09 '20 at 08:14
@Cecemel I added an answer. If it helped you, up vote it. If it solved your problem, click on the checkbox near it to select it as the official problem solver. — karlphillip, Feb 09 '20 at 09:28
@MarkSetchell My code was just a collection of pieces of code copied from documentation examples, I suspected it was doing unnecessary back-and-forth converting but I didn't know for sure because I didn't know what those functions (frombytes, getdata, astype) actually did. — Cedric, Feb 09 '20 at 12:30

score 1 · Answer 1 · answered Feb 08 '20 at 21:37

1

With 280ms of processing per frame, you are going to get 3-4 frames/sec. You pretty much only have 2 choices.

Either share your code and hope we can improve it.

Or, use multiprocessing with, say 4 CPU cores, and give the first frame to the first core, the second to the second and so on, round-robin, and you can maybe get a frame out every 70ms, leading to 14 fps.

answered Feb 08 '20 at 21:37

Mark Setchell

191,897
31
273
432

I'm out of votes for the day, but I agree. I can't see anything blatantly bad in there. – AMC Feb 08 '20 at 21:41
Thanks for your response. I've added the full code but I don't really understand why that would help since I had already included the 'Image processing' lines in the first piece of code. – Cedric Feb 08 '20 at 23:08

score 1 · Accepted Answer · answered Feb 09 '20 at 09:26

1

The problem is that your approach starts with a BGRA image format. That's a lot of data and its probably unnecessary. There might be more efficient ways of grabbing the screenshot and converting it to an OpenCV image. Here's an approach that takes about 56ms on my slow machine:

import ctypes
import datetime
import cv2
import numpy as np

from PIL import ImageGrab


# workaround to allow ImageGrab to capture the whole screen
user32 = ctypes.windll.user32
user32.SetProcessDPIAware()

# measure running time
start_time = datetime.datetime.now()

# take a full screenshot of the desktop
image = np.array(ImageGrab.grab( bbox= (40, 0, 800, 600) ))

# convert from RGB to BGR order so that colors are displayed correctly
mat = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

# compute elapsed time
delta = datetime.datetime.now() - start_time
elapsed_time_ms = int(delta.total_seconds() * 1000)
print('* Elapsed time:', elapsed_time_ms, 'ms')

cv2.imshow('mat', mat)
cv2.waitKey()

answered Feb 09 '20 at 09:26

karlphillip

92,053
36
243
426

If you don't need the image to be in BGR format, you can leave it as RGB by getting rid of `cv2.cvtColor()` and making this solution even faster. – karlphillip Feb 09 '20 at 09:35
Running your answer, in combination with trying some things out myself solved my problem. I replaced ImageGrab with the mss module for taking a screenshot (100ms vs 10ms on my machine). I then used RGBA2RGB because I was getting an error: Cannot feed value of shape (1, 600, 800, 4) for Tensor 'image_tensor:0', which has shape '(?, ?, ?, 3)'. Now it's running at about 30fps. – Cedric Feb 09 '20 at 12:46
As an aside, which hardware are you running? You mentioned 56ms but on my laptop (i9-9980hk, rtx2080 max q) it was taking about double that. I suspect ImageGrab was bottlenecked by the integrated graphics (UHD Graphics 630). – Cedric Feb 09 '20 at 12:50
Notebook with Intel **i5-8250U** @1.6GHz 1.8GHz. Video card is an **Intel UHD Graphics 620**. We understand that the final and ideal implementation you came up with might contain improvements that were not originally mentioned on this thread, but if you reflect on who helped you identify the problem and then offered an alternative solution... you might come to the conclusion that this answer deserves the checkbox that selects it as the official problem solver. – karlphillip Feb 09 '20 at 13:01
Your answer did indeed help me, which is why I upvoted it, but wouldn't this page be more helpful to future visitors if I edited the title to "Faster way of feeding screenshot stream into Object Detection neural network" and marked my own answer as the solution? – Cedric Feb 09 '20 at 13:10
It's fine to accept your own answer, however, I asked you to please not change the title since this is a problem that has really nothing to do with object detection and neural network. Feel free to remove the obj detection code that you added later to the question since it really doesn't add any relevant details to help identify the problem. – karlphillip Feb 09 '20 at 13:14
Ok, I'll leave the title. I had already removed the object detection code. – Cedric Feb 09 '20 at 13:27

score 1 · Answer 3 · answered Feb 09 '20 at 13:01

Using these lines instead of the "Image Processing:" lines from my first post solved my problem:

image = sct.grab(monitor)
image_np = np.array(image)
image_np = cv2.cvtColor(image_np, cv2.COLOR_RGBA2RGB)

I had previously already tried using only the first 2 lines, but I was getting this error:

ValueError: Cannot feed value of shape (1, 600, 800, 4) for Tensor 'image_tensor:0', which has shape '(?, ?, ?, 3)'

It hadn't occurred to me that converting the image from rgba to rgb would fix this. I'm getting about 30fps now.

Fixing inefficient image conversion from PIL image to OpenCV Mat

3 Answers3