17

I'm writing an Android app to extract a Sudoku puzzle from a picture. For each cell in the 9x9 Sudoku grid, I need to determine whether it contains one of the digits 1 through 9 or is blank. I start off with a Sudoku like this:

enter image description here

I pre-process the Sudoku using OpenCV to extract black-and-white images of the individual digits and then put them through Tesseract. There are a couple of limitations to Tesseract, though:

  1. Tesseract is large, contains lots of functionality I don't need (I.e. Full text recognition), and requires English-language training data in order to function, which I think has to go onto the device's SD card. At least I can tell it to only look for digits using tesseract.setVariable("tessedit_char_whitelist", "123456789");
  2. Tesseract often misinterprets a single digits as a string of digits, often containing newlines. It also sometimes just plain gets it wrong. Here are a few examples from the above Sudoku:

enter image description here

I have three questions:

  1. Is there any way I can overcome the limitations of Tesseract?
  2. If not, what is a useful, accurate method to detect individual digits (not k-nearest neighbours) that would be feasible to implement on Android - this could be a free library or a DIY solution.
  3. How can I improve the pre-processing to target that method? One possibility I've considered is using a thinning algorithm, as suggested by this post, but I'm not going to bother implementing it unless it will make a difference.
Community
  • 1
  • 1
1''
  • 26,823
  • 32
  • 143
  • 200
  • That's an awesome idea for an app. If you've finished it, could you put up a link for it? By the way, I think that having a boundary around your digit images would be helpful. Simply make an image with 2 pixels more in both the height and width dimensions, use the outside for a black boundary, and put the original image in the middle. – bballdave025 Feb 20 '20 at 15:40

2 Answers2

12

I took a class with one of the computer vision superstars who was/is at the top of the digit recognition algorithm rankings. He was really adamant that the best way to do digit recognition is...

1. Get some hand-labeled training data.
2. Run Histogram of Oriented Gradients (HOG) on the training data, and produce one
    long, concatenated feature vector per image
3. Feed each image's HOG features and its label into an SVM
4. For test data (digits on a sudoku puzzle), run HOG on the digits, then ask 
    the SVM classify the HOG features from the sudoku puzzle

OpenCV has a HOGDescriptor object, which computes HOG features. Look at this paper for advice on how to tune your HOG feature parameters. Any SVM library should do the job...the CvSVM stuff that comes with OpenCV should be fine.

For training data, I recommend using the MNIST handwritten digit database, which has thousands of pictures of digits with ground-truth data.

A slightly harder problem is to draw a bounding box around digits that appear in nature. Fortunately, it looks like you've already found a strategy for doing bounding boxes. :)

solvingPuzzles
  • 8,541
  • 16
  • 69
  • 112
  • 1
    +1 - How many data is needed for training ? Wouldn't be more better to use datas collected from a lot of sudoku puzzles to for training instead of handwritten digit database because his test data will be almost same, not much differences like in handwritten data? – Abid Rahman K Nov 10 '12 at 08:10
  • 1
    Very useful post! I'm pleased to hear that what I've done so far is harder than what I have to do next :). – 1'' Nov 10 '12 at 17:13
  • Unfortunately, HOGDescriptor is made for Nvidia GPUs and doesn't exist in OpenCV4Android. [The source code](https://github.com/Itseez/opencv/blob/master/modules/gpu/src/hog.cpp) and [documentation](http://docs.opencv.org/2.4.3rc/modules/gpu/doc/object_detection.html) both make reference to a CPU version but it doesn't seem to exist anywhere. What would you suggest I do? – 1'' Nov 10 '12 at 17:14
  • 1
    Is all of the CPU OpenCV code available in CV4Android? If so, then HOGDescriptor does exist; I've used it. If you checkout (`git clone`) the OpenCV source tree, you can see the CPU HOG code here: `opencv/modules/objdetect/src`. For an example HOGDescriptor application, see `opencv/samples/cpp/peopledetect.cpp`. However, instead of using the people detector functionality, you'll want `HOGDescriptor hog(); h.compute(...)` to get the actual HOG descriptors. Once you get this working, look at how to put custom parameters into the HOGDescriptor constructor. – solvingPuzzles Nov 10 '12 at 17:36
  • @AbidRahmanK Ah, I was thinking that the OP also wants to recognize hand-drawn numbers in the sudoku too. The HOG+SVM strategy should work for 'computer font' digits too though. Just train on the font that you're trying to recognize. (Or, even better, train on hand-labeled camera images of the font you want to recognize.) – solvingPuzzles Nov 10 '12 at 17:42
  • Thanks, I got HoG working! Ideally, I'd just use pictures of Sudoku puzzles to make the dataset, but I don't have enough of them. Instead, I think I'll try this: type the digits 1-9 in 20-30 fonts that might be used in Sudokus; take a screenshot; isolate the digits with OpenCV (like @AbidRahmanK did in [this post](http://stackoverflow.com/a/9620295/1397061); apply HoG and SVM; and save the model to a file using [svm.save()](http://docs.opencv.org/modules/ml/doc/statistical_models.html#cvstatmodel-save). Do you have any suggestions? Are there any caveats I should be aware of? – 1'' Nov 10 '12 at 21:01
  • @solvingPuzzles 100% accuracy, with default parameters :):):):):) – 1'' Nov 13 '12 at 05:38
  • @solvingPuzzles: can you guide me regarding development of play card recognition using android camera? i have posted a question but I am not sure I am going on right direction or not. I would be thankful for your any suggestions on [this question](http://stackoverflow.com/questions/29072000/opencv4android-template-matching-using-camera) – Mehul Joisar Mar 17 '15 at 07:41
1

Easiest thing is to use Normalized Central Moments for digit recognition. If you have one font (or very similar fonts it works good).

See this solution: https://github.com/grzesiu/Sudoku-GUI

In core there are things responsible for digit recognition, extraction, moments training. First time application is run operator must provide information what number is seen. Then moments of image (extracted square roi) are assigned to number (operator input). Application base on comparing moments.

Here first youtube movie shows how application works: http://synergia.pwr.wroc.pl/2012/06/22/irb-komunikacja-pc/

krzych
  • 2,126
  • 7
  • 31
  • 50