Improve a picture to detect the characters within an area

Question

My goal is to detect the characters on images of this kind.

I need to improve the image so that Tesseract does a better recognition, probably by doing the following steps:

Rotate the image so that the blue rectangle is horizontal [Need help on this]
Crop the image according to the blue rectangle [Need help on this]
Apply a thresholding filter and a gaussian blur

Use Tesseract to detect the characters

img = Image.open('grid.jpg')
image = np.array(img.convert("RGB"))[:, :, ::-1].copy()


# Need to rotate the image here and fill the blanks
# Need to crop the image here

# Gray  the image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Otsu's thresholding
ret3, th3 = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

# Gaussian Blur
blur = cv2.GaussianBlur(th3, (5, 5), 0)

# Save the image
cv2.imwrite("preproccessed.jpg", blur)

# Apply the OCR
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
tessdata_dir_config = r'--tessdata-dir "C:/Program Files (x86)/Tesseract-OCR/tessdata" --psm 6'

preprocessed = Image.open('preproccessed.jpg')
boxes = pytesseract.image_to_data(preprocessed, config=tessdata_dir_config)

Here is the output image I get which is not perfect for the OCR:

OCR problems:

The blue rectangle is sometimes recognized as characters, this is why I would like to crop the image
Sometimes Tesseract recognizes the characters on a line as a word (GCVDRTEUQCEBURSIDEEC) and some other times as individual letters. I would like it to be always a word.
The little pyramid at the bottom right is recognized as a character

Any other suggestions to improve the recognition are welcome

It's not particularly important for your current algorithm, but note that OpenCV uses BGR ordering by default while PIL (what you're using to load your images) stores them as RGB. When you're converting to grayscale, you're using BGR2GRAY even though your image input is actually RGB. This will give slightly different results, as blue is weighted a little more heavily to brightness than red. — alkasm, Sep 24 '18 at 08:14
Solution for "blue rectangle": Find contour with max area [doc](https://docs.opencv.org/3.0.0/d4/d73/tutorial_py_contours_begin.html). — Slawomir Orlowski, Sep 24 '18 at 11:42
Is there always a blue rectangle or did you just draw it on this image to help you get started? — Mark Setchell, Sep 24 '18 at 11:55

score 2 · Accepted Answer · answered Sep 24 '18 at 13:17

Here's one idea for a way to proceed...

Convert to HSV, then start in each corner and progress towards the middle of the picture looking for the nearest pixel to each corner that is somewhat saturated and has a hue matching your blueish surrounding rectangle. That will give you the 4 points marked in red:

Now use a perspective transform to shift each of those points to the corner to make the image rectilinear. I used ImageMagick but you should be able to see that I translate the top-left red dot at coordinates (210,51) into the top-left of the new image at (0,0). Likewise, the top-right red dot at (1754,19) gets shifted to (2064,0). The ImageMagick command in Terminal is:

convert wordsearch.jpg \
  -distort perspective '210,51,0,0 1754,19,2064,0 238,1137,0,1161 1776,1107,2064,1161' result.jpg

That results in this:

The next issue is uneven lighting - namely the bottom-left is darker than the rest of the image. To offset this, I clone the image and blur it to remove high frequencies (just a box-blur, or box-average is fine) so it now represents the slowly varying illumination. I then subtract the image from this so I am effectively removing background variations and leaving only high-frequency things - like your letters. I then normalize the result to make whites white and blacks black and threshold at 50%.

convert result.jpg -colorspace gray \( +clone -blur 50x50 \) \
   -compose difference -composite  -negate -normalize -threshold 50% final.jpg

The result should be good for template matching if you know the font and letters or for OCR if you don't.

I wonder if you could improve the warping (so that it's not stretched in either direction) by checking the distance from letters; if you get some bounding boxes for them, on average, the letters should be equally spaced vertically and horizontally. — alkasm, Sep 24 '18 at 18:44
@AlexanderReynolds Yes, good point. I just stretched to the same size as the original picture which has more spare canvas on the sides than the top so the letters do have a different aspect ratio. — Mark Setchell, Sep 24 '18 at 18:49

score 2 · Answer 2 · answered Sep 25 '18 at 16:15

Here's a slightly different approach using pyvips.

If the image is just rotated (ie. little or no perspective), you can take the FFT to find the angle of rotation. The nice, regular grid of characters will produce a clear set of lines on the transform. It should be very robust. This is doing the FFT on the entire image, but you could shrink it a bit first if you want more speed.

import sys
import pyvips

image = pyvips.Image.new_from_file(sys.argv[1])

# to monochrome, take the fft, wrap the origin to the centre, get magnitude
fft = image.colourspace('b-w').fwfft().wrap().abs()

Making:

To find the angle of the lines, turn from polar to rectangular coordinates and look for horizontals:

def to_rectangular(image):
    xy = pyvips.Image.xyz(image.width, image.height)
    xy *= [1, 360.0 / image.height]
    index = xy.rect()
    scale = min(image.width, image.height) / float(image.width)
    index *= scale / 2.0
    index += [image.width / 2.0, image.height / 2.0]
    return image.mapim(index)

# sum of columns, sum of rows
cols, rows = to_rectangular(fft).project()

Making:

With a projection of:

Then just look for the peak and rotate:

# blur the rows projection a bit, then get the maxpos
v, x, y = rows.gaussblur(10).maxpos()

# and turn to an angle in degrees we should counter-rotate by
angle = 270 - 360 * y / rows.height

image = image.rotate(angle)

To crop, I took the horizontal and vertical projections again, then searched for peaks with B > G.

cols, rows = image.project() 

h = (cols[2] - cols[1]) > 10000
v = (rows[2] - rows[1]) > 10000

# search in from the edges for the first non-zero value
cols, rows = h.profile()
left = rows.avg()

cols, rows = h.fliphor().profile()
right = h.width - rows.avg()
width = right - left

cols, rows = v.profile()
top = cols.avg()

cols, rows = v.flipver().profile()
bottom = v.height - cols.avg()
height = bottom - top

# move the crop in by a margin
margin = 10
left += margin
top += margin
width -= 2 * margin
height -= 2 * margin

# and crop!
image = image.crop(left, top, width, height)

To make:

And finally to remove the background, blur with a large radius and subtract:

image = image.colourspace('b-w').gaussblur(70) - image

To make:

Beautiful - I like your style :-) – Mark Setchell Sep 25 '18 at 16:43 — Mark Setchell, Sep 25 '18 at 16:43

score 1 · Answer 3 · answered Sep 24 '18 at 17:17

Here are my steps to recognize the chars:

(1) detect the blue in hsv space, approx the inner blur contour and sort the corner points:
(2) find persprctive transform matrix and do perspective transform
(3) threshold it (and find characters)
(4) use `mnist` algorithms to recognize the chars

step (1) find the corners of the blur rect

Choosing the correct upper and lower HSV boundaries for color detection with`cv::inRange` (OpenCV)

step (2) crop

step (3) threshold (and find the chars)

step (4) on working...

score 0 · Answer 4 · answered Sep 24 '18 at 08:07

0

I think it's better to remove to color instead of cropping.

It could be done with opencv see: python - opencv morphologyEx remove specific color

answered Sep 24 '18 at 08:07

ARR

2,074
1
19
28

Improve a picture to detect the characters within an area

4 Answers4