3

I'm currently working on a project that will read in an image of a Sudoku grid, detect the grid, identify the digits, solve the puzzle, & overlay the solution on the image. In order to identify the digits, I've divided the grid into n*2 images, where each image is an individual grid (examples: ex2 ex9), and run them into pytesseract. None of my images have any text being detected, however, even though it's just an image of a number w/ no noise/borders/etc.

I've tried the common methods of smoothing the image, various thresholding methods, resizing the image, inverting the image, and cropping the digit to a bounding box, but none of these seem to work. I've tested the code I wrote for pytesseract on other images and those all seem to work fine, it's only for my images that they don't work.

Can anyone provide suggestions for what I could try and/or why my images seem to not be easily processed?

For reference, here is the setup for pytesseract's image_to_string I've been using:

text = image_to_string(im, config='--psm 10 --oem 3' + '-c tessedit_char_whitelist=123456789')
Vikrant
  • 4,920
  • 17
  • 48
  • 72
sansona
  • 31
  • 2
  • Relevant: [simple-digit-recognition-ocr-in-opencv-python](https://stackoverflow.com/questions/9413216/simple-digit-recognition-ocr-in-opencv-python?rq=1) – stovfl Oct 10 '18 at 05:37

1 Answers1

0

I found a solution, but it is definitely not beautiful. I found that pytesseract was crap at capturing digits that have few number. I took inspiration from a CNN which uses "Zero padding" when doing image recognition. Now be aware, the only thing i took inspiration from, is the actual name of it, not the method (which is far more complex than anything this post will come close to).

I found an image which contained a 0, and create a "zero padding image" with three zeros (arbitrarily chosen number and amount of numbers). Anyway, i found that pytesseract was able to scan the image of the digits perfectly! 15/15 cases, rather than 3/15 cases. Remember to divide the OCR scanned digit by 1000 (if you use three 0's).

I used the horizontal image stacking technique shown in this post: Image stacking post

import numpy as np
from PIL import Image
def concat_images(imga, imgb):
    """
    type(imga): string of filename
    type(imgb): string of filename
    type(new_img): PIL.Image.Image

    """
    imga = np.asarray(Image.open(imga).convert('LA') )
    imgb = np.asarray(Image.open(imgb).convert('LA') )

    ha,wa = imga.shape[:2]
    hb,wb = imgb.shape[:2]
    max_height = np.max([ha, hb])
    total_width = wa+wb
    new_img = np.zeros(shape=(max_height, total_width, 2) , dtype = 'uint8')
    new_img[:ha,:wa]=imga
    new_img[:hb,wa:wa+wb]=imgb
    new_img = Image.fromarray(new_img)

    return new_img
Emil
  • 33
  • 6