-1

I'm currently trying to write a script to detect text in an OBS video stream using Python/OpenCV.

From every n-th frame, I need to detect text in several specific boundaries (Example can be found in the attachment). The coordinates of these boundaries are constant for all video frames.

My questions:

  • is OpenCV the best approach to solve my task?
  • what OpenCV function should I use to specify multiple boundaries for text detection?
  • is there a way to use a video stream from OBS as an input to my script?

Thank you for your help!

Example:

Nadia
  • 29
  • 8

1 Answers1

1

I can't say anything about OBS but openCV + Tessaract should be all you need. Since you know the location of the text very precisely it will be very easy to use. here is a quite comprehensive tutorial on using both, which includes bits on finding where the text is in the image.

The code could look like this:

img = cv2.imread("...") # or wherever you get your image from
region = [100, 200, 200, 400] # regions where text is
# Tessaract expects rgb open cv uses bgr
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

output = pytesseract.image_to_string(img_rgb[region[0]:region[2], region[1]: region[3]])

The only other steps that might be required are to invert the image in order to make it dark text on a light background. Those tips can be found here. For example removing the red background that is in one of the boxes you highlighted might help with accuracy, which can be achieved by thresholding on red values img_rgb[img_rgb[...,0] > 250] = [255, 255,255].

As for reading your images in, this other question might help.

McAngus
  • 1,826
  • 18
  • 34