OpenCV - Extract letters from string using python

Question

I have an image

from where I want to extract each and every character individually.

As i want something like THIS OUTPUT and so on.

What would be the appropriate approach to do this using OpenCV and python?

you have provided the same links for sample and output1. – frederick99 Feb 28 '17 at 07:04 — frederick99, Feb 28 '17 at 07:04
@frederick99 sorry..now please check it again.... – Bits Feb 28 '17 at 07:08 — Bits, Feb 28 '17 at 07:08

score 8 · Answer 1 · edited Feb 18 '19 at 18:44

8

A short addition to Amitay's awesome answer. You should negate the image using

cv2.THRESH_BINARY_INV

to capture black letters on white paper.

Another idea could be the MSER blob detector like that:

img = cv2.imread('path to image')
(h, w) = img.shape[:2]
image_size = h*w
mser = cv2.MSER_create()
mser.setMaxArea(image_size/2)
mser.setMinArea(10)

gray = cv2.cvtColor(filtered, cv2.COLOR_BGR2GRAY) #Converting to GrayScale
_, bw = cv2.threshold(gray, 0.0, 255.0, cv2.THRESH_BINARY | cv2.THRESH_OTSU)

regions, rects = mser.detectRegions(bw)

# With the rects you can e.g. crop the letters
for (x, y, w, h) in rects:
    cv2.rectangle(img, (x, y), (x+w, y+h), color=(255, 0, 255), thickness=1)

This also leads to a full letter recognition.

edited Feb 18 '19 at 18:44

Zoe

27,060
21
118
148

answered Nov 27 '17 at 15:41

crazzle

271
3
12

Your idea is great and it is working in most cases, but sometimes it detects two characters as one. Do you know a way to optimize it to get a perfect character segmentation? – t2t Nov 19 '19 at 10:21
Despite tweaking the MSER parameters, you can use dilate + erode to increase the gap (use it on a mask and crop from the original image afterwards). Sorry for the late reply though. – crazzle Feb 17 '20 at 08:17
In very difficult scenarios (a lot of noice in the image) this is not working very well with cv2. I'm going to build my own model to separate the chars. – t2t Feb 17 '20 at 16:31

score 1 · Answer 2 · edited May 23 '17 at 10:29

You can do the following ( opencv 3.0 and aboove)

Run Otsu thresholding on the image (http://docs.opencv.org/3.2.0/d7/d4d/tutorial_py_thresholding.html)
Run connected component labeling with stats on the threshold images.(How to use openCV's connected components with stats in python?)
For each connected component take the bounding box using the stat you got from step 2 which has for each one of the comoneonts the follwing information (cv2.CC_STAT_LEFT cv2.CC_STAT_TOP cv2.CC_STAT_WIDTH cv2.CC_STAT_HEIGHT)
Using the bounding box crop the component from the original image.

OpenCV - Extract letters from string using python

2 Answers2