I'm writing an Android app to extract a Sudoku puzzle from a picture. For each cell in the 9x9 Sudoku grid, I need to determine whether it contains one of the digits 1 through 9 or is blank. Here are the broad strokes of my algorithm:
- Adaptive threshold the puzzle
- Dilate to reduce the number of contours to consider
- Find the contours of the puzzle and warp it to a square
- Divide the square into 81 equal cells; look for cells with at least 20% white pixels
- Find the white blob closest to the centre of these cells and get its bounding rectangle
- Use character recognition (k-nearest neighbours/Tesseract/etc.) on the portion of the image inside the bounding rectangle
Although I can remove the thick outer border of the Sudoku puzzle using a simple floodfill, the inner gridlines are not contiguous, even after dilation, and cannot be removed so easily. For illustration, here is a sample Sudoku after removing the outer grid lines:
Problem: Sometimes, there are enough gridlines in a cell that more than 20% of its pixels are white, so I misdetect that cell as having a number in it. Here is an example of such a cell:
I've considered unwarping the image to reduce the visibility of the inner gridlines. I could use a Hough Transform or the method described in this post to find the gridlines as a prelude to unwarping. However, I don't see any other significant benefits to unwarping, and it should be both safer and easier to just remove the gridlines entirely.
Alternatively, I could modify my pre-processing so that the inner gridlines remain intact. Currently my pre-processing is:
Imgproc.GaussianBlur(mat, mat, new Size(11,11), 0);
Imgproc.adaptiveThreshold(mat, matBW, 255,
Imgproc.ADAPTIVE_THRESH_MEAN_C, Imgproc.THRESH_BINARY_INV, 5, 2);
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_CROSS, new Size(3, 3));
Imgproc.dilate(matBW, matBW, kernel);
The Gaussian Blur is necessary to reduce noise before thresholding. The dilation is to make sure the outer gridlines are connected, but is not enough to reconnect the inner lines.
How can I consistently remove the inner gridlines, without affecting the rest of the image?
Many thanks.